Scraping with Selenium + Python Part 1

Last review

I tried to get the page information after logging in using Goutte last time, but I was defeated by image authentication. https://qiita.com/shioharu_/private/818154ac145c78076487

So this time I will change the method and scrape with Selenium + Python!

Introduction

Using Vagrant and VirtualBox on Windows 10 Introduce Selenium, Python and ChromeDriver to CentOS 7.0 in virtual environment.

Introduced with reference to the wisdom of our predecessors. https://worklog.be/archives/3422

Try using

sample.py


from selenium import webdriver
from selenium.webdriver.chrome.options import Options
 
options = Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')
options.add_argument('--window-size=1280,1024')
 
driver = webdriver.Chrome(options=options)
driver.get('https://www.yahoo.co.jp/')
 
driver.save_screenshot('test.png')
driver.quit()

Execute

python sample.py


test_.png

The top page of yahoo was captured safely, so the sample seems to be okay!

Last issue

Last time, there was image authentication, and I couldn't display the screen after logging in. Selenium has a standby process, so if you log in manually during that time, you should be able to go to the image authentication page! I thought, but I found that by specifying the profile path of Chrome, it will maintain the state of the specified profile. https://rabbitfoot.xyz/selenium-chrome-profile/

After all, you just have to specify the profile path when you are manually logged in in advance. Thank you for being concise.

Since I am using CentOS in a virtual environment this time, I thought that if I put a symbolic link in the windows environment on the mount destination, it will be referenced from there.

Example

mklink /J "C:\Users\[username]\Desktop\work\vagrant\User Data" "C:\Users\[username]\AppData\Local\Google\Chrome\User Data"


Let's rewrite the sample source and execute it

sample2.py


from selenium import webdriver
from selenium.webdriver.chrome.options import Options
 
options = Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')
options.add_argument('--window-size=1280,1024')
options.add_argument('--user-data-dir=Profile path with a symbolic link')
 
driver = webdriver.Chrome(options=options)
driver.get('https://p.eagate.573.jp/game/2dx/27/ranking/weekly.html')
 
driver.save_screenshot('test2.png')
driver.quit()

However

There was a ruthless capture of a non-logged-in image ...

hilogin.png

The cause was that the profile path reference was not working properly. There is a difference between the profile of chrome installed in the virtual environment and the profile of chrome on the windows side ... So there is no particular point in binding it in a virtual environment, so I would like to install Python and Selenium on the windows side and execute it.

Preferences on the Windows side

Reference: https://mylife8.net/install-selenium-and-run-on-windows/

Python https://www.python.org/downloads/ No special notes as it only follows the installer

Selenium After installing Python, you can install it by executing the following from the command prompt.

ChromeDriver https://sites.google.com/a/chromium.org/chromedriver/downloads Download the same Chrome Driver as your Chrome version. The location of chromedriver.exe can be anywhere, but I put it in the same place as Python for easy understanding.

\Users\[username]\AppData\Local\Programs\Python\Python38\chromedriver.exe


The environment variable PATH was also set above.

Run from windows side

Log in in advance from Chrome at https://p.eagate.573.jp/game/2dx/27/ranking/weekly.html, Let's keep Chrome closed. Rewrite the source below and execute!

sample3.py


from selenium import webdriver
from selenium.webdriver.chrome.options import Options
 
options = Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')
options.add_argument('--window-size=1280,1024')
options.add_argument('--user-data-dir=C:\\Users\\[username]\\AppData\\Local\\Google\\Chrome\\User Data')
 
driver = webdriver.Chrome(options=options)
driver.get('https://p.eagate.573.jp/game/2dx/27/ranking/weekly.html')
 
driver.save_screenshot('test3.png')
driver.quit()

I got it safely! screencapture-p-eagate-573-jp-game-2dx-27-ranking-weekly-html-2020-05-10-13_26_24.png

The part you actually want is the ranking part, so experiment to see if you can reach the ranking part. Try clicking and adjusting the page position to display the desired part.

sample4.py


from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time

options = Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')
options.add_argument('--window-size=1280,1024')
options.add_argument('--user-data-dir=C:\\Users\\[username]\\AppData\\Local\\Google\\Chrome\\User Data')

driver = webdriver.Chrome(options=options)
driver.get('https://p.eagate.573.jp/game/2dx/27/ranking/weekly.html')
driver.find_element_by_xpath("/html/body/div/div[1]/div/div/div[2]/div/div[2]/form/div[2]/ul[1]/li[3]/input").click()
time.sleep(3)

driver.execute_script("window.scrollTo(0, 800)")
time.sleep(3)

driver.save_screenshot('sample.png')
driver.quit()

It looks okay! _sample.png

General comment

Recommended Posts

Scraping with Selenium + Python Part 1
Scraping with Selenium + Python Part 2
Scraping with Selenium [Python]
Scraping with selenium in Python
Scraping with Selenium in Python
Scraping with selenium
Scraping with selenium ~ 2 ~
Scraping with Python
Scraping with Selenium
Python: Scraping Part 1
Python: Scraping Part 2
Scraping with Selenium in Python (Basic)
Scraping with Python, Selenium and Chromedriver
Successful scraping with Selenium
Scraping with Python (preparation)
Try scraping with Python.
Automate simple tasks with Python Part1 Scraping
Scraping with Python + PhantomJS
ScreenShot with Selenium (Python)
[Part1] Scraping with Python → Organize to csv!
Python web scraping selenium
Practice web scraping with Python and Selenium
Scraping with Python + PyQuery
Scraping RSS with Python
Serverless scraping using selenium with [AWS Lambda] -Part 1-
Image processing with Python (Part 2)
I tried scraping with Python
Studying Python with freeCodeCamp part1
Bordering images with python Part 1
Scraping with chromedriver in python
Festive scraping with Python, scrapy
Python: Working with Firefox with selenium
Studying Python with freeCodeCamp part2
Image processing with Python (Part 1)
Scraping with Tor in Python
Solving Sudoku with Python (Part 2)
Web scraping using Selenium (Python)
Image processing with Python (Part 3)
Scraping weather forecast with python
[Python + Selenium] Tips for scraping
I tried scraping with python
Web scraping beginner with python
I-town page scraping with selenium
I was addicted to scraping with Selenium (+ Python) in 2020
[Scraping] Python scraping
Try scraping with Python + Beautiful Soup
Playing handwritten numbers with python Part 1
Scraping with Node, Ruby and Python
Web scraping with Python ① (Scraping prior knowledge)
[Automation with python! ] Part 1: Setting file
Web scraping with Python First step
I tried web scraping with python.
Scraping with Python and Beautiful Soup
Let's do image scraping with Python
Get Qiita trends with Python scraping
Automate simple tasks with Python Part0
[Automation with python! ] Part 2: File operation
"Scraping & machine learning with Python" Learning memo
Get weather information with Python & scraping
Excel aggregation with Python pandas Part 1
Get property information by scraping with python