[PYTHON] You will be an engineer in 100 days ――Day 75 ――Programming ――About scraping 6

Click here until yesterday

You will become an engineer in 100 days-Day 70-Programming-About scraping

You will become an engineer in 100 days --Day 66 --Programming --About natural language processing

You will become an engineer in 100 days --Day 63 --Programming --Probability 1

You will become an engineer in 100 days-Day 59-Programming-Algorithms

You will become an engineer in 100 days --- Day 53 --Git --About Git

You will become an engineer in 100 days --Day 42 --Cloud --About cloud services

You will become an engineer in 100 days --Day 36 --Database --About the database

You will be an engineer in 100 days-Day 24-Python-Basics of Python language 1

You will become an engineer in 100 days --Day 18 --Javascript --JavaScript basics 1

You will become an engineer in 100 days --Day 14 --CSS --CSS Basics 1

You will become an engineer in 100 days --Day 6 --HTML --HTML basics 1

This time is also a continuation of scraping.

If you have finished installing Selenium, you can continue.

How to operate Selenium

Load Selenium

Load the library. Assuming that Google Chrome will run ...

from selenium import webdriver

#Driver settings
chromedriver = "Driver's full pass"
driver = webdriver.Chrome(executable_path=chromedriver)

I think that the save destination of the WEB driver is different for each person, so please rewrite it. This is the way to launch Google Chrome.

スクリーンショット 2020-06-03 18.04.12.png

If you get an error message, you need to match the version of the WEB driver and Chrome. It may also be necessary to set permissions so that the WEB driver can be executed, so check the error details and take appropriate action.

At this point, you can operate the browser, so you can perform various operations.

Once you open the browser, it stays open until you close it. Don't forget to drop it as opening it in large numbers consumes resources.

You can also open it in headerless mode when using selenium. The headerless mode is a mechanism that moves the browser behind the scenes without visibly launching it.

This is a very convenient mode because it saves resources and allows you to use Selenium on Linux servers.

How to write is to create a variable to add the option setting of the browser Add the headerless setting and add it to the argument of the WEB driver call method.

Option variable = webdriver.ChromeOptions () Optional variable .add_argument ('--headless') Driver variable = webdriver.Chrome (options = option variable)

from selenium import webdriver

#Driver settings
chromedriver = "Driver's full pass"

#Option setting
options = webdriver.ChromeOptions()
options.add_argument('--headless')
#Driver call
driver = webdriver.Chrome(executable_path=chromedriver,
options=options)

Access the website with Selenium

We will operate using the variables when selenium is called. Since we called it with the variable name driver earlier, we will call it the driver variable from now on.

To access the website

Driver variable .get (URL)

And execute it.

driver.get(URL)

Let's go to my HP as a trial.

driver.get('http://www.otupy.net')
スクリーンショット 2020-06-03 18.20.24.png

You can type in the URL to access the site each time you run it. It will take some time for all the websites to be displayed, so it is better to wait for a while before performing any subsequent operations.

Scroll within the site

You can scroll within the site by running Javascript. You can type the script with ʻexecute_script`.

Driver variable .execute_script (Javascript)

As the Javascript part, type the script as characters window.scrollBy (0, Y) and window.scrollTo (0, Y) Use to determine the scroll position.

window.scrollBy (0, window.innerHeight); for one page

If you specify window.scrollTo (0, document.body.scrollHeight);, you can scroll to the bottom.

Let's scroll.

#Scroll a little
driver.execute_script("window.scrollBy(0, window.innerHeight);")

#Scroll to the bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

Now you can scroll your browser around.

Find the element

To work with your site, you need to find the element of where you want to work. You can search for elements on the site such as input orchids.

There are many ways to find an element Driver variable .find_element_by_XXXX You can search by the value of each attribute with the method.

If an element is found, it will be extracted as a data type called WebElement.

** Search by id attribute **

Driver variable .find_element_by_id (value of id attribute)

** Search by name attribute **

Driver variable .find_element_by_name (value of name attribute)

** Search by class name **

Driver variable .find_element_by_class_name (class name)

** tag name **

Driver variable .find_element_by_tag_name (tag name)

** Search by link_text **

Driver variable .find_element_by_link_text (value of link_text)

CSS_Selector

Driver variable .find_element_by_css_selector (value of css_selector)

xpath

Driver variable .find_element_by_xpath (value of xpath)

Manipulate elements

You must find the element first to work with it. If you find an element by the above method, assign it to the element variable and you can perform the following operations.

Element variable .find_element_by_XXXX () Element variable. Operation method

** Click an element **

Element variable .click ()

** Enter characters in the element **

Element variable .send_keys (character)

** Key input with element **

Load the Keys library first.


from selenium.webdriver.common.keys import Keys

Then find the element and use send_keys to enter the keys.

Element variable .send_keys (Keys. Special keys)

The keys that can be handled are as follows.

Key Keys
Enter key Keys.ENTER
ALT key(Combined with normal key) Keys.ALT,"Key"
← key Keys.LEFT
→ key Keys.RIGHT
↑ key Keys.UP
↓ key Keys.DOWN
Ctrl key(Combined with normal key) Keys.CONTROL,"Key"
Delete key Keys.DELETE
HOME key Keys.HOME
END key Keys.END
ESCAPE key Keys.ESCAPE
equal Keys.EQUALS
COMMAND key Keys.COMMAND
F1 key Keys.F1
shift key(Combined with normal key) Keys.SHIFT,"Key"
Page down key Keys.PAGE_DOWN
Page up key Keys.PAGE_UP
Space bar Keys.SPACE
Return key Keys.RETURN
tab key Keys.TAB

Extract the source code of the page

You can get the source code of the page as a string.

Driver variable .page_source

driver.page_source

After acquisition, analysis can be performed using a library such as BeautifulSoup.

Summary

With selenium, with normal scraping techniques It is convenient because you can easily obtain information that cannot be obtained.

If you are having trouble getting data, try selenium. If you can do this, you will be able to get overwhelming data.

25 days until you become an engineer

Author information

Otsu py's HP: http://www.otupy.net/

Youtube: https://www.youtube.com/channel/UCaT7xpeq8n1G_HcJKKSOXMw

Twitter: https://twitter.com/otupython

Recommended Posts

You will be an engineer in 100 days ――Day 71 ――Programming ――About scraping 2
You will be an engineer in 100 days ――Day 74 ――Programming ――About scraping 5
You will be an engineer in 100 days ――Day 73 ――Programming ――About scraping 4
You will be an engineer in 100 days ――Day 75 ――Programming ――About scraping 6
You will be an engineer in 100 days ――Day 70 ――Programming ――About scraping
You will be an engineer in 100 days --Day 68 --Programming --About TF-IDF
You will be an engineer in 100 days ――Day 81 ――Programming ――About machine learning 6
You will be an engineer in 100 days ――Day 82 ――Programming ――About machine learning 7
You will be an engineer in 100 days ――Day 79 ――Programming ――About machine learning 4
You will be an engineer in 100 days ――Day 76 ――Programming ――About machine learning
You will be an engineer in 100 days ――Day 80 ――Programming ――About machine learning 5
You will be an engineer in 100 days ――Day 84 ――Programming ――About machine learning 9
You will be an engineer in 100 days ――Day 83 ――Programming ――About machine learning 8
You will be an engineer in 100 days ――Day 77 ――Programming ――About machine learning 2
You will be an engineer in 100 days ――Day 85 ――Programming ――About machine learning 10
You will be an engineer in 100 days --Day 63 --Programming --Probability 1
You will be an engineer in 100 days --Day 65 --Programming --Probability 3
You will be an engineer in 100 days --Day 64 --Programming --Probability 2
You will be an engineer in 100 days --Day 86 --Database --About Hadoop
You will be an engineer in 100 days ――Day 60 ――Programming ――About data structure and sorting algorithm
You will be an engineer in 100 days --Day 27 --Python --Python Exercise 1
You will be an engineer in 100 days --Day 34 --Python --Python Exercise 3
You become an engineer in 100 days ――Day 67 ――Programming ――About morphological analysis
You will be an engineer in 100 days ――Day 24 ―― Python ―― Basics of Python language 1
You will be an engineer in 100 days ――Day 30 ―― Python ―― Basics of Python language 6
You will be an engineer in 100 days ――Day 25 ―― Python ―― Basics of Python language 2
You will be an engineer in 100 days --Day 29 --Python --Basics of the Python language 5
You will be an engineer in 100 days --Day 33 --Python --Basics of the Python language 8
You will be an engineer in 100 days --Day 26 --Python --Basics of the Python language 3
You will be an engineer in 100 days --Day 35 --Python --What you can do with Python
You will be an engineer in 100 days --Day 32 --Python --Basics of the Python language 7
You will be an engineer in 100 days --Day 28 --Python --Basics of the Python language 4
When you get an error in python scraping (requests)
You have to be careful about the commands you use every day in the production environment.
What beginners think about programming in 2016