I used to scrape in practice, so this is a memo of the trick I input at that time.
Python3(3.6.2) Selenium Chrome driver(85.0.4183.87)
You can operate Javascript from Selenium by using the execute_script method.
For example, you can change the text color by manipulating the js setAttribute method as shown below.
python
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://www.hogefuga")
element = driver.find_element_by_xpath("//div[@class='fuga']/span")
driver.execute_script("arguments[0].setAttribute('style','color: red;')", element)
You can also use this method to display elements that have display: none ;. For example, you can display the hidden element by deleting the class name to which display: none; is applied from the hidden element.
python
#display in close class:none;If is applied
element = driver.find_element_by_xpath("//div[@class='hoge close']/span")
driver.execute_script("arguments[0].setAttribute('class','hoge')", element)
You need ** switch_to_frame () ** to get the elements in ifame.
python
driver.switch_to_frame(driver.find_element_by_xpath("//div[@class='hoge']/iframe"))
Now you can get the elements in ifame available. On the other hand, elements outside the iframe cannot be retrieved. Therefore, if you want to get the original element, you need to switch so that you can get the original element again by ** switch_to_default_content () **.
python
driver.switch_to_default_content()
When scraping by operating selenium, another window may open after clicking the link. If you want to perform some operation on another window, use ** switch_to_window () ** to switch the operation target to another window.
python
#Open another window
driver.find_element_by_xpath("//div[@class='hoge']/a").click()
#Windows that are open from the beginning
window_before = driver.window_handles[0]
#Newly open window
window_after = driver.window_handles[1]
#Switch the operation target of selenium to the newly displayed window
driver.switch_to_window(window_after)
#Switch the operation target of selenium to the window that is open from the beginning
driver.switch_to_window(window_before)
python
#Get radio button element
element = driver.find_element_by_id(“fugafuga”)
driver.execute_script("arguments[0].click();", element)
From a security point of view, Headless Chrome doesn't seem to implement the file download function by default. Therefore, it seems necessary to set to allow file download by post communication.
python
from selenium import webdriver
DOWNLOAD_URL = "https:www.hogefuga/file/download"
download_dir = "/home/download" #Location of downloaded files
def enable_download(driver, download_dir):
driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': download_dir}}
driver.execute("send_command", params)
def setting_chrome_options():
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument('--no-sandbox')
return chrome_options;
driver = webdriver.Chrome(executable_path="/usr/local/bin/chromedriver",options=setting_chrome_options())
enable_download(driver, download_dir)
driver.get(DOWNLOAD_URL)
Recommended Posts