[PYTHON] I tried using Headless Chrome from Selenium

** May 6, 2018: I wrote a new article that matches the current situation after Headless Chrome became Stable. See also here. ** **

The other day, PhantomJS's Vitaly talked about the story of retiring as a maintainer. PhantomJS has helped me as an easy way to use a headless browser. I want you to use Headless Chrome in the future, so I tried it.

I can find many samples that use Node.js, but I wanted to use Python for various reasons, so here I will use Headless Chrome via Selenium.

What is Headless Chrome?

It is a mode that works without displaying the screen, which will be available from Google Chrome 59. Useful for automated testing and web scraping.

As of April 28, 2017, it seems to be available on the Mac and Linux versions of the Dev or Canary channels. I tried it on the Mac version of the Canary channel. I also tried it on the Windows version of the Canary channel, but the screen was displayed even if I specified --headless. I think it will be available soon [^ 1].

[^ 1]: Reference: https://bugs.chromium.org/p/chromium/issues/detail?id=712981

For the time being, it is easy to use from chrome-remote-interface of Node.js, and there is a lot of information, so you should try it from here. Let's do it.

By the way, it seems that there are many examples of using Chrome on a virtual display such as Xvfb headlessly from long ago. When you google, let's check which meaning it is used.

Use Headless Chrome from Selenium

Headless doesn't mean much different than using regular Chrome. Operate Chrome from Selenium through Chrome Driver. When creating a Chrome WebDriver, pass ChromeOptions as an argument and specify the path and arguments of Chrome to be executed in it.

environment

The environment I tried is as follows.

Preparation

Assumption: Python 3.6 is installed.

  1. Install Google Chrome Canary (should not be needed once --headless is available on the Stable channel) .. Canary can coexist with Stable.
  2. Download ChromeDriver and place it in your PATH.
  3. Install Selenium (use virtual environment here).
(venv) $ pip install selenium

Sample code

Do a Google search. I modified the sample code of Python Crawling & Scraping and replaced the part that used PhantomJS with Headless Chrome.

selenium_google.py


import time

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options

options = Options()
#Chrome path (on Stable channel)--It should be unnecessary when headless becomes available)
options.binary_location = '/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary'
#Enable headless mode (comment out the next line to see the screen).
options.add_argument('--headless')
#Create a Chrome WebDriver object.
driver = webdriver.Chrome(chrome_options=options)

#Open the top screen of Google.
driver.get('https://www.google.co.jp/')

#In the title'Google'Make sure that is included.
assert 'Google' in driver.title

#Enter the search term and send.
input_element = driver.find_element_by_name('q')
input_element.send_keys('Python')
input_element.send_keys(Keys.RETURN)

time.sleep(2)  #In the case of Chrome, it will transition with Ajax, so wait for 2 seconds for the time being.

#In the title'Python'Make sure that is included.
assert 'Python' in driver.title

#Take a screenshot.
driver.save_screenshot('search_results.png')

#Display search results.
for a in driver.find_elements_by_css_selector('h3 > a'):
    print(a.text)
    print(a.get_attribute('href'))

driver.quit()  #Quit the browser.

When I executed the following, the search results were output without displaying the browser screen.

(venv) $ python selenium_google.py
Python -Wikipedia
https://ja.wikipedia.org/wiki/Python
Python Tutorial — Python 3.6.1 document
http://docs.python.jp/3/tutorial/
Python basic course(1 What is Python?) - Qiita
http://qiita.com/Usek/items/ff4d87745dfc5d9b85a4
10 contents that even beginners can study Python almost for free-paiza development diary
http://paiza.hatenablog.com/entry/2015/04/09/%E5%88%9D%E5%BF%83%E8%80%85%E3%81%A7%E3%82%82%E3%81%BB%E3%81%BC%E7%84%A1%E6%96%99%E3%81%A7Python%E3%82%92%E5%8B%89%E5%BC%B7%E3%81%A7%E3%81%8D%E3%82%8B%E3%82%B3%E3%83%B3%E3%83%86%E3%83%B3%E3%83%8410
[Must-see for beginners] What is Python? Thorough explanation of language characteristics, share, and work market|samurai...
http://www.sejuku.net/blog/7720
Don't be bitten by Python:List of security risks to watch out for|programming...
http://postd.cc/a-bite-of-python/
What is Python-Hatena Keyword-Hatena Diary
http://d.hatena.ne.jp/keyword/Python
Learning site from introduction to application of Python
http://www.python-izm.com/
Learn with Python An introduction to programming from the basics(1)Programming in Python...
http://news.mynavi.jp/series/python/001/
Download Python | Python.org
https://www.python.org/downloads/

Differences from PhantomJS, etc.

Summary

At least on OS X, I could easily use Headless Chrome. It would be even easier if it could be used on the Stable channel. If you remove the --headless option, the screen will be displayed, so I'm happy that it seems easy to debug.

Recommended Posts

I tried using Headless Chrome from Selenium
I tried using Selenium with Headless chrome
I tried using parameterized
I tried using argparse
I tried using mimesis
I tried using anytree
I tried using aiomysql
I tried using Summpy
I tried using coturn
I tried using Pipenv
I tried using matplotlib
I tried using "Anvil".
I tried using Hubot
I tried using ESPCN
I tried using openpyxl
I tried using Ipython
I tried using PyCaret
I tried using cron
I tried using face_recognition
I tried using Jupyter
I tried using PyCaret
I tried using doctest
I tried using folium
I tried using jinja2
I tried using folium
I tried using time-window
I tried web scraping using python and selenium
[I tried using Pythonista 3] Introduction
I tried using easydict (memo).
I tried face recognition using Face ++
I tried using BigQuery ML
I tried using Amazon Glacier
I tried using git inspector
[Python] I tried using OpenPose
I tried using PySpark from Jupyter 4.x on EMR
I tried reading data from a file using Node.js.
I tried using AWS Chalice
I tried using Slack emojinator
[AWS] I tried using EC2, RDS, Django. Environment construction from 1
I tried using the Python library from Ruby with PyCall
[Python scraping] I tried google search top10 using Beautifulsoup & selenium
I tried to get data from AS / 400 quickly using pypyodbc
I tried using Rotrics Dex Arm # 2
I tried using Thonny (Python / IDE)
I tried server-client communication using tmux
I tried task queuing from Celery
I tried reinforcement learning using PyBrain
I tried deep learning using Theano
Somehow I tried using jupyter notebook
[Kaggle] I tried undersampling using imbalanced-learn
I tried shooting Kamehameha using OpenPose
I tried using the checkio API
[Python] I tried using YOLO v3
I tried asynchronous processing using asyncio
Tips for using Selenium and Headless Chrome in a CUI environment
I tried to get data from AS / 400 quickly using pypyodbc Preparation 1
I tried scraping conversation data from Askfm
I tried using Amazon SQS with django-celery
I tried using Azure Speech to Text.
I tried using Twitter api and Line api
I tried playing a ○ ✕ game using TensorFlow