--I want to automatically check web pages that are updated manually frequently (this time I will use ZOZOTOWN as an example). --Create a Selenium execution environment using Fargate --Push the local container to ECR and deploy it to Fargate --Schedule browser operations by Selenium and operate as batch processing --Check the results of batch processing with CloudWatch
This article is intended to introduce Fargate + Selenium. The author is an informal candidate and has permission, so I use ZOZOTOWN, my service, as the subject matter! If you want to divert the content of the article, please do not violate the manners and rules!
Normally, when operating a container with EC2, it is necessary to manage the instance, but in the case of Fargate, the instance management is left to the Amazon side, and it is a service that can operate the container serverlessly just by registering the container.
Lambda is a well-known serverless service, but it lacks flexibility due to restrictions such as the inability to use containers and timeouts.
On the other hand, Fargate can provide a variety of services because it can register and use locally running containers as they are.
A browser-driven test tool for automating web application testing.
It supports various languages such as Python, Ruby, and Java, and you can easily create test scripts.
This time, we will build the following architecture on AWS.
Create a test script with Selenium + Python.
./Dockerfile
FROM joyzoursky/python-chromedriver:3.8-alpine3.10-selenium
WORKDIR /usr/src
ADD main.py /usr/src
CMD ["python", "main.py"]
This time, when using Selenium + Headless Chrome, This is the base image.
joyzoursky/python-chromedriver:3.7-alpine3.8-selenium https://hub.docker.com/r/joyzoursky/python-chromedriver/
python:./main.py
# -*- coding: utf-8 -*-
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.common.exceptions import TimeoutException, ElementClickInterceptedException, NoSuchElementException
def check_coupon(driver, my_favorite_brand):
#Transition to ZOZO coupon page
driver.get("https://zozo.jp/coupon/")
i = 1
while True:
try:
coupon_brand = driver.find_element_by_xpath(f'//*[@id="body"]/div[3]/ul/li[{i}]/a/figure/div[2]').text
if coupon_brand == my_favorite_brand:
return True
i += 1
except NoSuchElementException:
return False
if __name__ == '__main__':
try:
#Headless Chrome settings
options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument("--disable-setuid-sandbox")
options.add_argument('--window-size=1420,1080')
options.add_argument('--headless')
options.add_argument('--disable-gpu')
#Connect to Headless Chrome browser
driver = webdriver.Chrome(options=options)
#Set selenium operation timeout to 15 seconds
driver.implicitly_wait(15)
#favorite brand
my_favorite_brand = "Carlie e felice"
#Check coupon
if check_coupon(driver, my_favorite_brand):
print("I found it!", my_favorite_brand)
else:
print("I couldn't find it today ...")
#Exception handling
except ElementClickInterceptedException as ecie:
print(f"exception!\n{ecie}")
except TimeoutException as te:
print(f"timeout!\n{te}")
finally:
#End
driver.close()
driver.quit()
Check if there is a brand: ** Carlie e felice ** on the coupon page.
It was okay to scrape with Requests + Beautiful Soup 4, but this time I wanted to build an environment using Selenium, so please do not throw it at all ;;
#Building a container
$ docker build -t zozo_check_coupons .
#Execute container
$ docker run -it --rm zozo_check_coupons
I found it! Carlie e felice
After confirming that it was successfully executed in the local environment, the next step is to push this container to Amazon ECR.
ECR is an image like a private Docker Hub on AWS.
Create a repository dedicated to the container you want to manage this time in ECR.
--From Services, select ** ECR ** and ** Create Repository **
--Create a repository by entering the repository name "zozo_check_coupons"
--You have successfully created the repository
At this time, the URI of the repository will be used when pushing the container, so make a note of it.
$ aws ecr get-login --region ap-northeast-1 --no-include-email
docker login -u AWS -p ...
.
.
. .dkr.ecr.ap-northeast-1.amazonaws.com
#Returned docker login~Copy and type
$ docker login -u AWS -p ...
Login Succeeded
If Login Succeeded is displayed, it's OK.
Copy the URL of the repository you wrote down earlier and push it to the repository you created.
#Tag with the URI of the repository
$ docker build -t xxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/zozo_check_coupons .
#Push tagged container to ECR
$ docker push xxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/zozo_check_coupons
I was able to successfully push the container to the repository
Make a note of the image URI as it will be used in the task definition.
Create a cluster that is the environment for running the container
--Select ** ECS ** from Services and then ** Create Cluster **.
--Select the cluster template "Networking only"
--Enter the cluster name and check Create VPC.
--Finally, you can create a cluster by pressing the create button.
Next, define the task.
-Select ** Create new task definition **
--Select ** Fargate ** in the boot type compatibility selection
--Define the task as follows
If there is no task execution role, refer to the digression below and create it.
--Select Add Container and copy the container name and the URI of the container image you pushed earlier here.
--Set memory and CPU
--Finally, select ** Create ** to complete the task definition.
--Define the roles required to execute the task
bash:./task-execution-assume-role.json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "ecs-tasks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
--Create a role using the definition file
$ aws iam --region ap-northeast-1 create-role --role-name ecsTaskExecutionRole --assume-role-policy-document file://task-execution-assume-role.json
--Create a task definition file
./task-config.json
{
"family": "zozo-check-coupons-task",
"networkMode": "awsvpc",
"containerDefinitions": [
{
"name": "zozo-check-coupons-task",
"image": "xxxxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/zozo_check_coupons:latest",
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-region": "ap-northeast-1",
"awslogs-group": "/ecs/zozo_check_coupons-task",
"awslogs-stream-prefix": "ecs"
}
}
}
],
"requiresCompatibilities": ["FARGATE"],
"cpu": "256",
"memory": "512",
"executionRoleArn": "arn:aws:iam::xxxxxxxx:role/ecsTaskExecutionRole"
}
--Create a task based on the definition file
$ aws ecs register-task-definition --cli-input-json file://task-config.json
Now you can define the task without any mistakes or omissions.
For details, please see here https://docs.aws.amazon.com/ja_jp/AmazonECS/latest/developerguide/ecs-cli-tutorial-fargate.html
Next, we will finally execute the defined tasks on a schedule.
--Select the cluster you created, select ** Schedule Task **, and press ** Create **
――The settings are as follows. The fixed time was set to ** 24 ** because the coupon renewal was 24 hours.
--Select the created VPC when creating the cluster
--Finally, select Create to run the task
When the task is complete, a log will be sent to CloudWatch
When I checked, I found the following log!
It doesn't seem to be today. .. ..
We have created an environment for Fargate + Selenium! Fargate is quite flexible because you can register the container that you moved using the container as it is.
However, when crawling, the page loading to the CPU and memory becomes slow, the browser operation by the program may not work well, and a timeout may occur, so measures such as putting sleep were sufficient. It looks better.
Finally, this article is introduced for the purpose of introduction, so please read the etiquette and rules carefully before using it!
https://yomon.hatenablog.com/entry/2019/08/fargateselenium
Recommended Posts