[PYTHON] You will be an engineer in 100 days ――Day 74 ――Programming ――About scraping 5

Click here until yesterday

This time is also a continuation of scraping.

Arakata The principle of scraping has ended up to the last time. Today is the story of Selenium.

About Selenium

Selenium is framework software for automating the operation of web browsers.

By using Selenium, it is done by the Python requests library alone. You will be able to obtain information that cannot be obtained by scraping.

So what is the information that cannot be obtained?

In the normal requests library, the information that can be obtained by the get method etc. is the HTML source.

If some of that element is written to render in Javascript If Javascript does not work, it will not be reflected as data.

Therefore, the elements dynamically generated by Javascript are in the requests library. It cannot be obtained.

Since Selenium runs a web browser to get data, it is no different from accessing with a normal browser. Javascript also works and you can get the rendered data.

What you need to run Selenium

The following three are required to run Selenium on a PC.

** WEB browser ** Chrome, Firefox, Opera, etc.

WebDriver Software for operating the browser

Selenium A library that operates the browser programmatically in cooperation with WebDriver

Installation of various tools

The installation method is as follows.

** Installing a web browser ** Download from the download site of various browsers and install

Google Chrome

Firefox

Opera

** Download WebDriver ** WebDriver does not need to be installed, just download and deploy it. After downloading, place it in a directory close to the program.

The driver will change as the browser version is upgraded, so download it according to the version each time.

Google Chrome

Firefox

Opera

** Install Selenium ** The installation method in Python is as follows.

pip install selenium

Run Selenium

As a procedure to move Selenium

Browser installation
Download and deploy WebDriver
Install Selenium is.

Here, let's operate Google Chrome from Selenium.

from selenium import webdriver

#Driver settings
chromedriver = "Driver's full pass"
driver = webdriver.Chrome(executable_path=chromedriver)

driver.get('URL of access destination')

Doing this will launch your browser.

Since the browser to launch is Google Chrome, I am using webdriver.Chrome. The corresponding method changes depending on the browser. Firefox:webdriver.Firefox Opera:webdriver.Opera

ʻI write the path of WebDriver in executable_path` It doesn't seem to recognize it unless it is a full path (absolute path). Let's put the webdriver in a shallow hierarchy.

Have you been able to launch your browser using Selenium so far?

Next time, I will start how to operate the browser from here.

Summary

With selenium, with normal scraping techniques It is convenient because you can easily obtain information that cannot be obtained.

If you are having trouble getting data, try selenium.

26 days until you become an engineer

Author information

Otsu py's HP: http://www.otupy.net/

Youtube： https://www.youtube.com/channel/UCaT7xpeq8n1G_HcJKKSOXMw

Twitter： https://twitter.com/otupython