[PYTHON] I tried using Headless Chrome from Selenium

** May 6, 2018: I wrote a new article that matches the current situation after Headless Chrome became Stable. See also here. ** **

The other day, PhantomJS's Vitaly talked about the story of retiring as a maintainer. PhantomJS has helped me as an easy way to use a headless browser. I want you to use Headless Chrome in the future, so I tried it.

I can find many samples that use Node.js, but I wanted to use Python for various reasons, so here I will use Headless Chrome via Selenium.

What is Headless Chrome?

It is a mode that works without displaying the screen, which will be available from Google Chrome 59. Useful for automated testing and web scraping.

As of April 28, 2017, it seems to be available on the Mac and Linux versions of the Dev or Canary channels. I tried it on the Mac version of the Canary channel. I also tried it on the Windows version of the Canary channel, but the screen was displayed even if I specified --headless. I think it will be available soon [^ 1].

[^ 1]: Reference: https://bugs.chromium.org/p/chromium/issues/detail?id=712981

For the time being, it is easy to use from chrome-remote-interface of Node.js, and there is a lot of information, so you should try it from here. Let's do it.

Headless Chromium
https://chromium.googlesource.com/chromium/src/+/lkgr/headless/README.md
I tried Chrome's headless mode on Mac OS X-Qiita
http://qiita.com/g_ryotaro/items/da3cf376890bf3366a62

By the way, it seems that there are many examples of using Chrome on a virtual display such as Xvfb headlessly from long ago. When you google, let's check which meaning it is used.

Use Headless Chrome from Selenium

Headless doesn't mean much different than using regular Chrome. Operate Chrome from Selenium through Chrome Driver. When creating a Chrome WebDriver, pass ChromeOptions as an argument and specify the path and arguments of Chrome to be executed in it.

environment

The environment I tried is as follows.

OS X El Capitan
Google Chrome Canary 60.0.3082.0
ChromeDriver 2.29
Python 3.6.0
Selenium 3.4.0

Preparation

Assumption: Python 3.6 is installed.

Install Google Chrome Canary (should not be needed once --headless is available on the Stable channel) .. Canary can coexist with Stable.
Download ChromeDriver and place it in your PATH.
Install Selenium (use virtual environment here).

(venv) $ pip install selenium

Sample code

Do a Google search. I modified the sample code of Python Crawling & Scraping and replaced the part that used PhantomJS with Headless Chrome.

`selenium_google.py`


import time

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options

options = Options()
#Chrome path (on Stable channel)--It should be unnecessary when headless becomes available)
options.binary_location = '/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary'
#Enable headless mode (comment out the next line to see the screen).
options.add_argument('--headless')
#Create a Chrome WebDriver object.
driver = webdriver.Chrome(chrome_options=options)

#Open the top screen of Google.
driver.get('https://www.google.co.jp/')

#In the title'Google'Make sure that is included.
assert 'Google' in driver.title

#Enter the search term and send.
input_element = driver.find_element_by_name('q')
input_element.send_keys('Python')
input_element.send_keys(Keys.RETURN)

time.sleep(2)  #In the case of Chrome, it will transition with Ajax, so wait for 2 seconds for the time being.

#In the title'Python'Make sure that is included.
assert 'Python' in driver.title

#Take a screenshot.
driver.save_screenshot('search_results.png')

#Display search results.
for a in driver.find_elements_by_css_selector('h3 > a'):
    print(a.text)
    print(a.get_attribute('href'))

driver.quit()  #Quit the browser.

When I executed the following, the search results were output without displaying the browser screen.

(venv) $ python selenium_google.py
Python -Wikipedia
https://ja.wikipedia.org/wiki/Python
Python Tutorial — Python 3.6.1 document
http://docs.python.jp/3/tutorial/
Python basic course(1 What is Python?) - Qiita
http://qiita.com/Usek/items/ff4d87745dfc5d9b85a4
10 contents that even beginners can study Python almost for free-paiza development diary
http://paiza.hatenablog.com/entry/2015/04/09/%E5%88%9D%E5%BF%83%E8%80%85%E3%81%A7%E3%82%82%E3%81%BB%E3%81%BC%E7%84%A1%E6%96%99%E3%81%A7Python%E3%82%92%E5%8B%89%E5%BC%B7%E3%81%A7%E3%81%8D%E3%82%8B%E3%82%B3%E3%83%B3%E3%83%86%E3%83%B3%E3%83%8410
[Must-see for beginners] What is Python? Thorough explanation of language characteristics, share, and work market|samurai...
http://www.sejuku.net/blog/7720
Don't be bitten by Python:List of security risks to watch out for|programming...
http://postd.cc/a-bite-of-python/
What is Python-Hatena Keyword-Hatena Diary
http://d.hatena.ne.jp/keyword/Python
Learning site from introduction to application of Python
http://www.python-izm.com/
Learn with Python An introduction to programming from the basics(1)Programming in Python...
http://news.mynavi.jp/series/python/001/
Download Python | Python.org
https://www.python.org/downloads/

Differences from PhantomJS, etc.

The screen is not displayed during execution, but the Chrome icon is displayed in the Dock.
Chrome will not quit unless you explicitly do driver.quit ().
With PhantomJS, when I entered the search term and sent it, it waited until onload, but with Chrome it transitions with Ajax, so I had to wait separately (this is due to the difference in Google behavior, Chrome and Google It doesn't matter).
~~ In headless mode, the screenshot taken with save_screenshot () became a 1x1 image. Maybe I need some options. ~~

Summary

At least on OS X, I could easily use Headless Chrome. It would be even easier if it could be used on the Stable channel. If you remove the --headless option, the screen will be displayed, so I'm happy that it seems easy to debug.