Introduction

When scraping with Selenium + Headless Chrome, I came across a site that gives a NoSuchElementException error as soon as I set it to headless, even though I can get information in head mode. There were few articles in Japanese about workarounds, so I will post them.

Status

-Scraping is possible in head mode. -A NoSuchElementException occurred as soon as the headless option was added.

debug

Cause investigation

It seems that the element has not been obtained, so I checked the source of the site with driver.page_source.

`scraping.py`


driver.page_source

The returned HTML has the words "Access Denied", and it seems that access from headless is denied.

<html><head>
webapp_1        | <title>Access Denied</title>
webapp_1        | </head><body>
webapp_1        | <h1>Access Denied</h1>
webapp_1        |  
webapp_1        | You don't have permission to access "http://www.xxxxxxx/" on this server.<p>

Countermeasures

Upon examination, the chrome driver had a user_agent option that could be pretended to be accessed from a browser. By adding this to the option of chromedrivere, you can get the element safely.

`scraping.py`


options = webdriver.ChromeOptions()
            options.binary_location = '/usr/bin/google-chrome'
            options.add_argument('--no-sandbox')
            options.add_argument('--headless')
            options.add_argument('--disable-gpu')
            options.add_argument('--lang=ja-JP')
            options.add_argument(f'user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36') #add to

that's all

[PYTHON] How to scrape pages that are “Access Denied” in Selenium + Headless Chrome

Introduction

Status

debug

Cause investigation

scraping.py

Countermeasures

scraping.py

`scraping.py`

`scraping.py`