How to Read From Website by Using Selenium List
How to Use Selenium to Web-Scrape with Example
Scraping NBA Histrion Names and Salaries from Hoopshype.com Using Selenium
Selenium is a Python library and tool used for automating web browsers to practise a number of tasks. One of such is spider web-scraping to extract useful information and data that may be otherwise unavailable. Here'south a step-by-stride guide on how to utilise Selenium with the example being extracting NBA player salary data from the website https://hoopshype.com/salaries/players/.
Step i — Install and Imports
pip install selenium
In one case installed, you're ready for the imports.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import pandas as pd
Footstep 2 — Install and Access WebDriver
A webdriver is a vital ingredient to this process. It is what will actually exist automatically opening upwards your browser to access your website of choice. This pace is different based on which browser you use to explore the internet. I happen to use Google Chrome. Some say Chrome works all-time with Selenium, although information technology does also support Internet Explorer, Firefox, Safari, and Opera. For chrome y'all get-go need to download the webdriver at https://chromedriver.chromium.org/downloads. In that location are several different download options based on your version of Chrome. To locate what version of Chrome you accept, click on the iii vertical dots at the top right corner of your browser window, scroll downwards to aid, and select "Nearly Google Chrome". At that place you will come across your version. I have version lxxx.0.3987.149, shown in the screenshots beneath.
At present you demand to know where you saved your webdriver download on your local computer. Mine is just saved in my default downloads binder. You now can create a commuter variable using the direct path of the location of your downloaded webdriver.
commuter = webdriver.Chrome('/Users/MyUsername/Downloads/chromedriver')
Step 3 — Access Website Via Python
Very simple however very important step. Yous need your lawmaking to really open the website y'all're attempting to scrape.
commuter.become('https://hoopshype.com/salaries/players/')
When run, this code snippet will open the browser to your desired website.
Step 4— Locate Specific Information You're Scraping
In order to excerpt the information that you're looking to scrape, you demand to locate the chemical element's XPath. An XPath is a syntax used for finding any element on a webpage. To locate the element's XPath, highlight the outset in the listing of what you're looking for, right click, and select audit; this opens up the developer tools. For my example, I first want to locate the NBA actor names, so I offset select Stephen Curry.
In the programmer tools, we now meet the element "Stephen Back-scratch" appears as such.
<td class="proper noun">
<a href="https://hoopshype.com/histrion/stephen-curry/salary/">
Stephen Curry </a>
</td>
This element can hands be translated to its XPath, only showtime, we need to think that we aren't only trying to locate this chemical element, but all player names. Using the aforementioned procedure, I located the next element in the listing, Russell Westbrook.
<td course="proper name">
<a href="https://hoopshype.com/player/russell-westbrook/bacon/">
Russell Westbrook </a>
</td>
The commonality betwixt these two (and all other player names) is <td class="name">
, so that is what we volition be using to create a list of all player names. That translated into an XPath looks like //td[@class="name"]
. Breaking that down, all XPaths are preceded by the double slash, which we desire in a td tag, with each class in that td tag needing to correspond to "proper noun". We now can create the list of player names with this Selenium function.
players = commuter.find_elements_by_xpath('//td[@course="name"]')
And now to get the text of each thespian name into a list, we write this part.
players_list = []
for p in range(len(players)):
players_list.append(players[p].text)
Post-obit this same process to acquire the player salaries…
Stephen Curry's 2019/20 Bacon
<td mode="color:blackness" form="hh-salaries-sorted" data-value="40231758">
$40,231,758 </td>
Russel Westbrook'south 2019/20 Salary
<td manner="colour:black" class="hh-salaries-sorted" data-value="38506482">
$38,506,482 </td>
While inspecting these elements and translating to XPath, we can ignore style and data-value, merely worrying almost the class.
salaries = driver.find_elements_by_xpath('//td[@class="hh-salaries-sorted"]')
And the list of salaries text…
salaries_list = []
for southward in range(len(salaries)):
salaries_list.append(salaries[s].text)
Step 5 — Apply to Each Yr Available and Tie Everything Together
Ofttimes, when using Selenium, you'll exist attempting to remember data that is located on multiple different pages from the same website. In my example, hoopshype.com has NBA salary data dating back to the 1990/91 flavour.
As y'all tin can see, the departure between the URL of each season is just a affair of the years being included at the end. And so the URL for the 2018/nineteen season is https://hoopshype.com/salaries/players/2018-2019/ and the URL for the 1990/91 season is https://hoopshype.com/salaries/players/1990-1991/. With that, we can create a function that loops through each yr and accesses each URL individually then puts all of the steps we've previously shown together for each twelvemonth individually. I also pair each histrion with their salary for that season together, place into a temporary dataframe, add the year onto that temporary dataframe, and then add this temporary dataframe to a principal dataframe that includes all of the data we've caused. The terminal code is below!
Source: https://towardsdatascience.com/how-to-use-selenium-to-web-scrape-with-example-80f9b23a843a
0 Response to "How to Read From Website by Using Selenium List"
Post a Comment