Scraping with Selenium in Python (Basic)

I decided to study scraping and tried to operate the browser with Selenium, so I will summarize it briefly.

What was used

Prepare a driver that matches your browser

To operate the browser, you need to prepare a driver for each browser. Since we are using Chrome this time, download ChromeDriver from the Official Site.

Install Selenium

Install selenium with pip

pip install selenium

Try to open a web page

Open browser

webdriver.Chrome(driver_path)



 Open web page

#### **`driver.get(URL)`**
```get(URL)


 Close web page

#### **`driver.close()`**
```close()


 Exit browser (close all windows)

#### **`driver.quit()`**
```quit()


``` python
from selenium import webdriver
driver = webdriver.Chrome(driver_path)
driver.get(URL)
driver.close()
driver.quit()

Try to access the element

In order to access the HTML element, you can specify the element from id, class, name, etc. and get it.

Reference

Get by id

driver.find_element_by_id('ID')



 Get by class

#### **`driver.find_element_by_class_name('CLASS_NAME')`**
```find_element_by_class_name('CLASS_NAME')


 Get by name

#### **`driver.find_element_by_name('NAME')`**
```find_element_by_name('NAME')


 Get with link text

#### **`driver.find_elements_by_link_text('LINK_TEXT')`**
```find_elements_by_link_text('LINK_TEXT')


 Get nested elements by specifying path

#### **`driver.find_elements_by_xpath(".//a")`**

action

Operate the web page by taking an action on the acquired element.

Reference

Click the button

driver.find_element_by_id('Btn').click()



 Enter characters in Form

#### **` driver.find_element_by_name('From').send_keys("text") `**

stand by

Often, the process runs and an error occurs before the screen has finished loading. You can wait a few seconds for the necessary elements to deal with this.

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

WebDriverWait(driver, WAIT_SECOND).until(EC.presence_of_element_located((By.CLASS_NAME, 'Btn')))

Browser operation

Let's operate it lightly based on the above.

Try clicking the button

For example, if you want to press the purchase button on a certain site

スクリーンショット_2016-11-21_13_22_52.png スクリーンショット_2016-11-21_13_27_44.png
from selenium import webdriver
driver = webdriver.Chrome(driver_path)
driver.get(URL)
driver.find_element_by_class_name('new_addToCart').click()
driver.quit()

Like this, find_element_by_class_name () gets the element and click () causes the click action.

Try entering text

Enter a search keyword in the search box and press the search button.

スクリーンショット_2016-11-21_13_49_25.png スクリーンショット_2016-11-21_13_48_53.png
from selenium import webdriver
driver = webdriver.Chrome(driver_path)
driver.get(URL)
driver.find_element_by_id('searchWords').send_keys("search text")
driver.find_element_by_id('searchBtn').click()	

This will automatically enter "search text" in the search box and search.

Summary

If you learn the basic operations such as pressing buttons and entering text, you will get the impression that most operations are easy. After all, the benefit of being able to perform parallel processing is great by performing browser operations programmatically. However, if you launch a lot of browsers, your PC will become extremely heavy, so you have to be careful about that.

Recommended Posts