Python + Selenium + Headless Chromium with aws lambda

I am making an application that periodically scrapes with Lambda and stores it in DynamoDB.

If you google immediately, many articles will come out. Thank you. I was trying by referring to this article, but there was a point I was addicted to for about 2 days, so I made a note.

https://masakimisawa.com/selenium_headless-chrome_python_on_lambda/

This chromedriver_linux64.zip

Bring it from cloud9 (Amazon Linux)Image.png

lambda layer is Python 3.6 Of course, the runtime of lambda is also 3.6

Addictive point

I checked the version many times, but the following error on lambda Chrome failed to start: exited abnormally\n (unknown error: DevToolsActivePort file doesn't exist)

There are many opinions that it can be fixed by adding some options such as the following articles. https://stackoverflow.com/questions/50642308/webdriverexception-unknown-error-devtoolsactiveport-file-doesnt-exist-while-t

In the article, it's java code, but I changed it to Python and

options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--no-sandbox")

However

Another error occurred unknown error: unable to discover open window in chrome

After all, in addition to some of the above, in this article options.add_argument("--single-process") It was solved by adding. https://stackoverflow.com/questions/60229291/aws-lambda-ruby-crawler-selenium-chrome-driver-unknown-error-unable-to-discov

In addition to options.add_argument ("--headless "), click here for the option settings that were finally added as troubleshooting.

options.add_argument("--single-process")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--no-sandbox")

Another addictive point

When uploading a layer to lambda I put headless-chrome and chromedriver in a dir called headless-chrome and zipped it, but when I did it with 7zip, it did not work with an error like executable may have wrong permission. Looking at various things, it seems that the deployment package does not work well with windows zip, so if you zip it with wsl ubuntu. This problem has been resolved.

Remarks

I was at a loss about how to scrape with Cloud Funtion + Typescript Puppeteer,

Recommended Posts

Python + Selenium + Headless Chromium with aws lambda
Notify HipChat with AWS Lambda (Python)
[AWS] Using ini files with Lambda [Python]
[Python] Run Headless Chrome on AWS Lambda
Connect to s3 with AWS Lambda Python
ScreenShot with Selenium (Python)
Scraping with Selenium [Python]
AWS CDK with Python
Serverless scraping using selenium with [AWS Lambda] -Part 1-
LINE BOT with Python + AWS Lambda + API Gateway
Serverless application with AWS SAM! (APIGATEWAY + Lambda (Python))
Dynamic HTML pages made with AWS Lambda and Python
Scraping with selenium in Python
[Python] Scraping in AWS Lambda
Python: Working with Firefox with selenium
Scraping with Selenium in Python
Deploy Python3 function with Serverless Framework on AWS Lambda
AWS Lambda with PyTorch [Lambda import]
Create a Layer for AWS Lambda Python with Docker
I want to AWS Lambda with Python on Mac!
Scraping with Selenium + Python Part 2
Make ordinary tweets fleet-like with AWS Lambda and Python
[AWS] Try adding Python library to Layer with SAM + Lambda (Python)
Create API with Python, lambda, API Gateway quickly using AWS SAM
Site monitoring and alert notification with AWS Lambda + Python + Slack
Summary if using AWS Lambda (Python)
[AWS] Create API with API Gateway + Lambda
Scraping with Selenium in Python (Basic)
Scraping with Python, Selenium and Chromedriver
Text extraction with AWS Textract (Python3.6)
Face detection with Lambda (Python) + Rekognition
Run Python on Schedule on AWS Lambda
Using Lambda with AWS Amplify with Go
[Python] Convert CSV file uploaded to S3 to JSON file with AWS Lambda
Manipulating kintone data with Python & C Data ODBC Driver from AWS Lambda
Install pip in Serverless Framework and AWS Lambda with Python environment
Get html from element with Python selenium
WebUI test with Python2.6 + Selenium 2.44.0 --profile setting
I tried using Selenium with Headless chrome
Manipulate DynamoDB data with Lambda (Node & Python)
[AWS] Link Lambda and S3 with boto3
[Python] Automatically operate the browser with Selenium
Use selenium phantomjs webdriver with python unittest
[AWS] Do SSI-like things with S3 / Lambda
Touch AWS with Serverless Framework and Python
Try assigning or switching with Python: lambda
I just did FizzBuzz with AWS Lambda
ImportError when trying to use gcloud package with AWS Lambda Python version
[AWS] Create a Python Lambda environment with CodeStar and do Hello World
Easy server monitoring with AWS Lambda (Python) and result notification in Slack
Scraping with selenium
AWS-Perform web scraping regularly with Lambda + Python + Cron
Periodically run a python program on AWS Lambda
[AWS SAM] Create API with DynamoDB + Lambda + API Gateway
Scraping with selenium ~ 2 ~
Statistics with python
Regular serverless scraping with AWS lambda + scrapy Part 1.8
Python with Go
Try running Google Chrome with Python and Selenium
Achieve Basic Authentication with CloudFront Lambda @ Edge with Python 3.8
Play with Lambda layer (python) for about 5 minutes