It is convenient to use a library in Python to analyze and organize data. I often use Numpy, Matplotlib, Pandas, Seaborn.
After trial and error of various analyzes with Jupyter Notebook, if you decide that you should do this kind of processing, you will want to script it and run it automatically every day.
Feel free to run AWS Lambda to run scripts every hour. You don't have to maintain the server. Moreover, it is cheap.
But when I try to use the library with Python of AWS Lambda, I have to give Create a deploy package. not. Moreover, if you use a library that requires compilation such as Numpy, you will need an Amazon Linux environment, which is a hassle.
AWS Cloud9 can be used as a development environment by setting up a server on EC2 and accessing it from the browser of your PC. It may be good to run Amazon Linux with Docker, but it is easier to set up on the cloud. Moreover, if you haven't used it for a while, it seems that the EC2 instance will be suspended without permission, which is kind to your wallet.
Create a new environment from the AWS Cloud9 menu. There is nothing to be careful about, but for now, the only platforms you can choose are Amazon Linux or Ubuntu Server 18.04 LTS. Lambda's environment is becoming Amazon Linux 2, but it can't be helped. This time I chose Amazon Linux.
Basically, set up according to Documentation. When Cloud9 launches, click the "AWS Resources" tab on the far right to display the Lambda menu. When you press the "Create a new Lambda function" button, a wizard called "Create serverless application" will appear. This time I will make it with the name envtest. Unfortunately, Python 3.6 was the only Python that could be selected as the Runtime.
When the wizard finishes, you should have a folder called envtest.
Before writing the code, let's install the required libraries. envtest/venv/ Python's Virtual Environment is available in. From the Cloud9 IDE console
source ./envtest/venv/bin/activate
After activating by typing, pip will install more and more required libraries. AWS Lambda has a limit of 250MB for deploy package size including libraries. Install only the minimum required. This time Numpy, Matplotlib, Pandas, Seaborn, Pillow, boto3 I installed it, but it was okay (just barely).
envtest/envtest/lambda_function.py I will write a Lambda function in.
This time, in order to grasp the trend of the number of people infected with coronavirus in Tokyo, we will read the CSV of the number of infected people in Tokyo published every day, plot the daily value and the 7-day moving average, and upload it to S3. did. I have a 320x240 pixel bitmap as shown in Adafruit's Pyportal (https://www.adafruit.com/product/4116).
2020-9-5 It looks like this now.
lambda_function.py
import base64
import io
from datetime import datetime as dt
import boto3
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
sns.set(style="whitegrid")
from PIL import Image
#Acquisition of details of announcement of new coronavirus positive patients
URL = 'https://stopcovid19.metro.tokyo.lg.jp/data/130001_tokyo_covid19_patients.csv'
s3_client = boto3.client('s3')
def lambda_handler(event, context):
df = pd.read_csv(URL)
df['date'] = df['Published_date'].apply(lambda a: dt.strptime(a, '%Y-%m-%d'))
df = pd.DataFrame(df['date'].value_counts(sort=False).sort_index())
df['ma7'] = df.iloc[:,0].rolling(window=7).mean()
#PyPortal is 320 x 240
ppi = 227
width = 320 / ppi
height = 240 / ppi
SMALL_SIZE = 4
MEDIUM_SIZE = 10
BIGGER_SIZE = 12
plt.rc('font', size=SMALL_SIZE) # controls default text sizes
plt.rc('axes', titlesize=SMALL_SIZE) # fontsize of the axes title
plt.rc('axes', labelsize=MEDIUM_SIZE) # fontsize of the x and y labels
plt.rc('xtick', labelsize=SMALL_SIZE) # fontsize of the tick labels
plt.rc('ytick', labelsize=SMALL_SIZE) # fontsize of the tick labels
plt.rc('legend', fontsize=SMALL_SIZE) # legend fontsize
plt.rc('figure', titlesize=BIGGER_SIZE) # fontsize of the figure title
fig, ax = plt.subplots(figsize=(width,height), dpi=ppi)
ax.plot(df['date'], color='g', label="Daily")
ax.plot(df['ma7'], 'r', label="ma7")
#ax.text(0, 1, "212", fontsize=4)
ax.set_title("Tokyo COVID-19")
#fig.legend()
fig.autofmt_xdate()
plt.tight_layout()
pic_IObytes = io.BytesIO()
plt.savefig(pic_IObytes, format='png')
pic_IObytes.seek(0)
im = Image.open(pic_IObytes)
pic_IObytes_bmp = io.BytesIO()
im.save(pic_IObytes_bmp, format='bmp')
pic_IObytes_bmp.seek(0)
#pic_hash = base64.b64encode(pic_IObytes_bmp.read())
s3_client.upload_fileobj(pic_IObytes_bmp, "pyportal", "covid19_tokyo.bmp", ExtraArgs={'ACL': 'public-read'})
return ''
df = pd.read_csv(URL)
With just this one line, it will download the CSV from the URL and make it Pandas.DataFrame. great.
df['date'] = df['Published_date'].apply(lambda a: dt.strptime(a, '%Y-%m-%d'))
df = pd.DataFrame(df['date'].value_counts(sort=False).sort_index())
df['ma7'] = df.iloc[:,0].rolling(window=7).mean()
To make it easier to handle later, create a DateTime object column from the column containing the date as a character string, and redefine it as a DataFrame indexed. Also, calculate the 7-day moving average and create a new column.
fig, ax = plt.subplots(figsize=(width,height), dpi=ppi)
ax.plot(df['date'], color='g', label="Daily")
ax.plot(df['ma7'], 'r', label="ma7")
Graph drawing with matplotlib. I want to make it 320x240pixel in the end, so set the dpi appropriately and set the width and height.
pic_IObytes = io.BytesIO()
plt.savefig(pic_IObytes, format='png')
pic_IObytes.seek(0)
Save the graph in PNG format in memory. Pyportal can only read bitmap format, so I wanted to save it as a bitmap. However, the backend of matplotlib I am using did not support saving in BMP format.
im = Image.open(pic_IObytes)
pic_IObytes_bmp = io.BytesIO()
im.save(pic_IObytes_bmp, format='bmp')
pic_IObytes_bmp.seek(0)
Since there is no help for it, once saved as PNG format, open it with Pillow and save it again as BMP format.
s3_client.upload_fileobj(pic_IObytes_bmp, "pyportal", "covid19_tokyo.bmp", ExtraArgs={'ACL': 'public-read'})
Upload to S3 and publish.
Lambda functions are usually cumbersome to debug. Simple things like using only standard functions can still use the IDE on the AWS Lambda management screen, but if you use a lot of external libraries like this time, you can not do that either.
However, the Cloud9 IDE makes it easy to debug. You can also set a breakpoint and use the debugger. Thank you.
Debug with "Run local" and deploy if successful. Simply press the "Deploy the selected Lambda function" button from the Cloud9 IDE's Lambda menu. It will zip the environment and upload it to AWS Lambda.
Since this time we will upload to S3 at the end, we need to set the Amazon S3FullAccess policy for the role of this function from the AWS Lambda management screen.
If all goes well, set EventBridge (CloudWatch Events) as a trigger and set rate (1 day) to complete.
The Cloud9 IDE isn't bad either, but there are times when I want to do a lot with Jupyter Notebook. Therefore, launch Jupyter Notebook on the server of Cloud9 so that you can develop with Jupyter Notebook from the browser of your PC or smartphone.
You will probably run out of EBS volume because you will be preparing an environment for experimenting with Jupyter Notebook. Make it bigger according to the documentation. I kept it at 20 GiB.
Resize an Amazon EBS volume used by an environment
From the Cloud9 IDE console, set up miniconda and set up your favorite Python environment.
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
After that, you can create an environment as you like with conda.
jupyter notebook --ip=0.0.0.0
When launching, specify 0.0.0.0 for the IP so that it can be accessed from the outside. Security is rugged, so at least set a password.
If you click "Go To Instance" in the EC2 Instance item on the AWS Cloud9 management screen, the management screen of the EC2 you are using will appear. You can access Jupyter Notebook by copying the URL in "Public DNS (IPv4)" to the clipboard and accessing it with a browser with ": 8888" and the port number specified at the end. Like this.
http://ec2-xx-xxx-xx-xx.ap-northeast-1.compute.amazonaws.com:8888/
Recommended Posts