Periodically execute Python Script on AWS Data Pipeline

Introduction

I think there are quite a few Needs who want to run Python Script on AWS on a regular basis. It can be realized by setting up EC2 and executing it with cron, but here I will explain how to realize it using the function of AWS Data Pipeline.

However, as a limitation of Data Pipeline, please note that the execution cycle can only be set to 15 minutes or more, and it cannot be executed every minute.

It is also possible to periodically execute the Lambda Function in Data Pipeline. If the Script is Node.js or Java, I think it's easier to do it this way.

Overall flow

The flow of items to be set is as follows. It is assumed that the Python Script itself has already been completed.

Place Python Script on S3 Bucket

Creating a Data Pipeline

Check the processing result of Data Pipeline

Supplement

Place Python Script on S3 Bucket

Creating an S3 bucket

Create an S3 bucket to put the Python Script. Of course, the existing Bucket can be used. Go to AWS Console → S3 and follow the steps below to create an S3 Bucket.

Upload Python Script to S3 Bucket

Follow the steps below to upload Python Script to S3 Bucket.

datapipeline_test.py



#!/usr/bin/env python
# -*- coding: utf-8 -*-

import datetime
print 'Script run at ' + datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')

Creating a Data Pipeline

Creating a Data Pipeline

Go to AWS Console → Data Pipeline and create a Data Pipeline by following the steps below.

sudo yum -y install python-devel gcc && sudo update-alternatives --set python /usr/bin/python2.7 && curl "https://bootstrap.pypa.io/get-pip.py" -o "get-pip.py" && sudo python ./get-pip.py && pip install boto3 --user && aws s3 cp s3://datapipeline-python-test/datapipeline_test.py ./datapipeline_test.py && cat datapipeline_test.py && python ./datapipeline_test.py

Select ʻEdit Architectwith this setting to create a Data Pipeline once. When created, two IAM Roles are created in the IAM Role:DataPipelineDefaultResourceRole and DataPipelineDefaultRole`.

IAM Role permission settings

Since some privileges are insufficient immediately after creating the IAM Role, grant access privileges to S3 to DataPipelineDefaultResourceRole and DataPipelineDefaultRole. Go to AWS Console → Identity & Access Management → Roles and follow the steps below to grant permissions.

Set the same permissions for DataPipelineDefaultRole

Data Pipeline Activate

Go to AWS Console → Data Pipeline and activate the Data Pipeline you just created.

The Data Pipeline's periodic execution is now activated. It runs every 15 minutes, so let's wait for a while.

Check the processing result of Data Pipeline

Go to AWS Console → Data Pipeline, select Test Pipeline, select Stdout in CliActivity → ʻAttempts tab`, and confirm that the current time is output by Python Script.

Supplement about Shell Script

I haven't done much, but I'll supplement the contents of the above ShellScript.

Skip Alarm on Fail

It is also possible to skip Alarm Email using the function of AWS SNS when Python Script fails. I will omit the explanation of AWS SNS itself, but I will briefly supplement the settings on Data Pipeline.

It is OK if you set. It is possible to fire AWS SNS at the time of Script Fail or Success. Don't forget to give the Role Permission to run Sns.

Finally

If Python Script can be executed periodically with Data Pipeline, there is no need to individually secure / manage Hosts for periodic execution or guarantee execution, and various progress will be made.

Recommended Posts

Periodically execute Python Script on AWS Data Pipeline
Periodically run a python program on AWS Lambda
[Grasshopper] When creating a data tree on Python script
Execute python3 system with PHP exec () on AWS EC2
[Python] Notes on data analysis
[Treasure Data] [Python] Execute a query on Treasure Data using TD Client
Building an environment to execute python programs on AWS EC2
Execute Python script from batch file
Run Python on Schedule on AWS Lambda
Execute Python Script during CodeSys # RunTime
AWS Layer Creation Script for python
Periodically run Python on Heroku Scheduler
Data pipeline construction with Python and Luigi
Execute Python code on C ++ (using Boost.Python)
Execute Python script with cron of TS-220
[Python] Run Headless Chrome on AWS Lambda
Verification of how to periodically execute a script on a Linux server on Windows
[Python] 100 knocks on data science (structured data processing) 018 Explanation
[2020 version] How to install Python3 on AWS EC2
[Python] 100 knocks on data science (structured data processing) 023 Explanation
Python development on Ubuntu on AWS EC2 (using JupyterLab)
Folium: Visualize data on a map with Python
[Python] 100 knocks on data science (structured data processing) 017 Explanation
[Node-RED] Execute Python on Anaconda virtual environment from Node-RED [Anaconda] [Python]
[Python] 100 knocks on data science (structured data processing) 026 Explanation
[Python] 100 knocks on data science (structured data processing) 016 Explanation
[Python] 100 knocks on data science (structured data processing) 024 Explanation
List the AMIs used by AWS Data Pipeline
[Python] 100 knocks on data science (structured data processing) 027 Explanation
Try importing MLB data on Mac and Python
TensorFlow: Run data learned in Python on Android
[Python] 100 knocks on data science (structured data processing) 029 Explanation
Run Python on Apache to view InfluxDB data
[Python] 100 knocks on data science (structured data processing) 015 Explanation
[Python] 100 knocks on data science (structured data processing) 028 Explanation
Script python file
Python on Windows
python script skeleton
twitter on python3
Data analysis python
Python script profiling
Import python script
python on mac
Jupyter on AWS
Python on Windbg
[python] Read data
# 2 Build a Python environment on AWS EC2 instance (ubuntu18.04)
Check types_map when using mimetypes on AWS Lambda (Python)
Deploy Python3 function with Serverless Framework on AWS Lambda
Support for Python 2.7 runtime on AWS Lambda (as of 2020.1)
I want to AWS Lambda with Python on Mac!
MySQL installation on Aws Linux 2 and test data preparation