Click here until yesterday
You will become an engineer in 100 days-Day 70-Programming-About scraping
You will become an engineer in 100 days --Day 66 --Programming --About natural language processing
You will become an engineer in 100 days --Day 63 --Programming --Probability 1
You become an engineer in 100 days --Day 59 --Programming --Algorithm
You will become an engineer in 100 days --- Day 53 --Git --About Git
You will become an engineer in 100 days --Day 42 --Cloud --About cloud services
You will become an engineer in 100 days --Day 36 --Database --About the database
You will be an engineer in 100 days --Day 24 --Python --Basics of Python language 1
You will become an engineer in 100 days --Day 18 --Javascript --JavaScript basics 1
You will become an engineer in 100 days --Day 14 --CSS --CSS Basics 1
You will become an engineer in 100 days --Day 6 --HTML --HTML basics 1
This time is also a continuation of scraping.
Arakata The principle of scraping has ended up to the last time. Today is the story of Selenium.
Selenium
is framework software for automating the operation of web browsers.
By using Selenium
, it is done by the Python requests
library alone.
You will be able to obtain information that cannot be obtained by scraping.
So what is the information that cannot be obtained?
In the normal requests
library, the information that can be obtained by the get method etc. is the HTML source.
If some of that element is written to render in Javascript If Javascript does not work, it will not be reflected as data.
Therefore, the elements dynamically generated by Javascript are in the requests
library.
It cannot be obtained.
Since Selenium
runs a web browser to get data, it is no different from accessing with a normal browser. Javascript also works and you can get the rendered data.
The following three are required to run Selenium
on a PC.
** WEB browser ** Chrome, Firefox, Opera, etc.
WebDriver Software for operating the browser
Selenium A library that operates the browser programmatically in cooperation with WebDriver
The installation method is as follows.
** Installing a web browser ** Download from the download site of various browsers and install
** Download WebDriver ** WebDriver does not need to be installed, just download and deploy it. After downloading, place it in a directory close to the program.
The driver will change as the browser version is upgraded, so download it according to the version each time.
** Install Selenium ** The installation method in Python is as follows.
pip install selenium
As a procedure to move Selenium
Here, let's operate Google Chrome
from Selenium
.
from selenium import webdriver
#Driver settings
chromedriver = "Driver's full pass"
driver = webdriver.Chrome(executable_path=chromedriver)
driver.get('URL of access destination')
Doing this will launch your browser.
Since the browser to launch is Google Chrome
, I am using webdriver.Chrome
.
The corresponding method changes depending on the browser.
Firefox:webdriver.Firefox
Opera:webdriver.Opera
ʻI write the path of WebDriver in executable_path` It doesn't seem to recognize it unless it is a full path (absolute path). Let's put the webdriver in a shallow hierarchy.
Have you been able to launch your browser using Selenium so far?
Next time, I will start how to operate the browser from here.
With selenium, with normal scraping techniques It is convenient because you can easily obtain information that cannot be obtained.
If you are having trouble getting data, try selenium.
26 days until you become an engineer
Otsu py's HP: http://www.otupy.net/
Youtube: https://www.youtube.com/channel/UCaT7xpeq8n1G_HcJKKSOXMw
Twitter: https://twitter.com/otupython
Recommended Posts