[AWS] A story that may be helpful for those who are new to Lambda-Python and DynamoDB

Introduction

Recently I'm interested in serverless architecture. The following 2 entries are articles when using Java Script with Google Apps Script, but this time I would like to use Lambda on AWS to execute Python code without a server. The content of the article is Python almost nice to meet you && AWS I will write a record until people who are almost nice to meet you can do various things with Lambda and DynamoDB. Also, at first I was desperate because I was too unfamiliar with AWS, Lambda got an error and killed me for the first time, I am interested in Python but I have used it less, so I would like to write an article from a fairly beginner's point of view. It's like writing a lot of photos and content that beginners can understand one by one, so I think that Pythonist and AWS advanced users may be new, but thank you.

The article I wrote last time -Remind tasks on the last business day -Google Calendar in-house version

Premise

AWS nice to meet you About 2 weeks of Python experience AWS service registered, free tier

AWS In my case, I was confused by the number of AWS services, so I first learned an overview of which services are connected to which services. I think the following will be helpful. I also searched for free books on my Kindle. "But I tried various things while reviewing this article while using it myself.

Let's summarize "AWS is what" in 3 lines What I did and what I was addicted to during the free period of the first year of AWS

Registration

Please refer to another article for AWS registration as it is assumed to be registered.

Role creation

It's a lot of trouble without an AWS role, so We will create a role that has full access to S3, DynamoDB, and Lambda.

IAM from AWS Service List

スクリーンショット 2017-03-15 14.22.30.png

Role >> Create a new role

スクリーンショット 2017-03-15 14.22.43.png

Role name This time [lambda_dynamo]

スクリーンショット 2017-03-15 14.22.48.png

Amazon Lambda >> Select

スクリーンショット 2017-03-15 14.27.00.png

For the time being, I will add three Full Access this time. AWSLambdaFullAccess AmazonS3FullAccess AmazonDynamoDBFullAccess

Next step

スクリーンショット 2017-03-15 14.27.27.png

Role >> lambda_dynamo Okay if the item called is added.

スクリーンショット 2017-03-15 14.33.21.png

Lambda

It is a service that can execute scripts without a server. Convenient.

Fee structure

Basically, you can use it for testing as much as you wantOfficial

Lambda free frame

1 month request Computing time[GB*Seconds]
1,000,000 400,000

Requests: Total number of requests for the entire function Compute time: Memory count and time

Since the minimum memory of Lambda is 128MB, if you use 3,200,000 seconds () a month, it will reach 400,000 [GB * sec]. (128.0[MB] / 1024[MB/GB]) * 3200000[sec] = 400,000[GB*sec] So it's okay to run the script for about 888 hours. Let's use it steadily. (Please try to calculate the time yourself!)

Try using test code (Python error)

Lambda from AWS Service List

First function creation

Creating a Lambda function >> lambda-canary

スクリーンショット 2017-03-15 14.52.52.png

[Trigger settings] >> Delete >> Next Here you can mess with running the script once every few minutes. Can be set later, so delete it this time.

[Function settings] I changed the name to [sample] for the time being.

スクリーンショット 2017-03-15 14.57.34.png

Create a function by making the role the role you created earlier. lambda_dynamo

First test

from __future__ import print_function

import os
from datetime import datetime
from urllib2 import urlopen

SITE = os.environ['site']  # URL of the site to check, stored in the site environment variable
EXPECTED = os.environ['expected']  # String expected to be on the page, stored in the expected environment variable


def validate(res):
    return EXPECTED in res


def lambda_handler(event, context):
    print('Checking {} at {}...'.format(SITE, event['time']))
    try:
        if not validate(urlopen(SITE).read()):
            raise Exception('Validation failed')
    except:
        print('Check failed!')
        raise
    else:
        print('Check passed!')
        return event['time']
    finally:
        print('Check complete at {}'.format(str(datetime.now())))

スクリーンショット 2017-03-15 15.06.42.png

Yes I got an error.

    print('Checking {} at {}...'.format(SITE, event['time']))
KeyError: 'time'

Let's take a look at the variables used by default before resolving this error.

Variables used by default

event >> AWS Lambda uses this parameter to pass event data to the handler. This parameter is typically a Python dict type. You can also use the list, str, int, float, or NoneType types.

context >> AWS Lambda uses this parameter to provide runtime information to the handler. This parameter will be of LambdaContext type.

It has become.

This environment variable can be taken by os.environ

スクリーンショット 2017-03-15 15.15.28.png

time ?

Now let's handle the error.

スクリーンショット 2017-03-15 15.11.53.png

Add time for test event settings.

{
  "key3": "value3",
  "key2": "value2",
  "key1": "value1",
  "time": "now...!"
}

I don't think there is an error in the Time part.

START RequestId: a8708105-0948-11e7-b83e-b71ae2e4dbbe Version: $LATEST
Checking https://www.amazon.com/ at now...!...
Check failed!
Check complete at 2017-03-15 06:28:53.016209
HTTP Error 503: Service Unavailable: HTTPError
Traceback (most recent call last):
  (abridgement)
  File "/usr/lib64/python2.7/urllib2.py", line 556, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 503: Service Unavailable

END RequestId: a8708105-0948-11e7-b83e-b71ae2e4dbbe
REPORT RequestId: a8708105-0948-11e7-b83e-b71ae2e4dbbe	Duration: 348.59 ms	Billed Duration: 400 ms 	Memory Size: 128 MB	Max Memory Used: 17 MB

I'm still getting an error, so I'll do various things.

except I'm handling exceptions, so I'm getting an error!

First of all, I don't like the red screen, so even if the url request fails, the error will disappear.


    try:
        if not validate(urlopen(SITE).read()):
            raise Exception('Validation failed')
    except:
        print('Check failed!')
        raise

In the case of Python, if raise is inserted at the end, the error will be returned to the plain python statement as it is after catching the error with except. Rewrite this.


def lambda_handler(event, context):
    print('Checking {} at {}...'.format(SITE, event['time']))
    try:
        if not validate(urlopen(SITE).read()):
            raise Exception('Validation failed')
    except Exception as e:
        print('Check failed!')
        print(e)
    else:
        print('Check passed!')
        return event['time']
    finally:
        print('Check complete at {}'.format(str(datetime.now())))

Finally, the green check was successful ...! !! !! !!

スクリーンショット 2017-03-15 15.44.10.png

Catch the error with ʻexcept Exception as e: Output an error withprint (e) I deletedraise` and made it no error for the time being.

I did a lot of research on python's error handling around here. raise. ..

HTTP Error 503

The error is still going on, so let's take a look at the HTTP Error 503: Service Unavailable: HTTP Error part. Also, since there is no word Validation failed, validate (urlopen (SITE) .read ()) will judge that the error returned in this part is the above error.

def validate(res):
    return EXPECTED in res

This will go to https://www.amazon.com/ and diagnose if the returned html file contains the words "Online Shopping".

For the time being, change the environment variable from Amazon to access to Google

Then


Checking https://www.google.co.jp/ at now...!...
Check passed!
Check complete at 2017-03-15 07:00:05.723916

Finally Check passed! Is issued.

503 Service Unavailable Service not available. The service is temporarily unavailable due to overload or maintenance. As an example, it is returned when access is flooded and it becomes unprocessable.

https://www.amazon.com/ Can't fall. I thought from the bottom of my heart and forgot to return the environment variables and check them. I changed it to just a Google service and it worked. Are you denying access from Lambda?

I'm addicted to it.

indent

It's a Python error, but I got an error several times even with the number of indexes.

Syntax error in module 'lambda_function': 
unexpected indent (lambda_function.py, line 30)

I studied around here, but it's a war between tabs and spaces (4). Tab vs space war when writing programming code is finally settled It is written that python is generally written in spaces.

The file that I edited locally is included in tab Code edited on AWS can be entered with 4 spaces. I didn't want to see this indent error, so I switched to the space school.

Requests

You can use urlopen as it is, but I would like to introduce requests.

python urllib2 module Requests: HTTP for Humans Code to do the same thing without Requests

It will be easier to send messages to slack.

Code now.py


# coding: utf-8
from __future__ import print_function

import os
import json
import requests
from datetime import datetime
from urllib2 import urlopen

SITE = os.environ['site']
EXPECTED = os.environ['expected']

def validate(res):
    return EXPECTED in res

def lambda_handler(event, context):
    print('Checking {} at {}...'.format(SITE, event['time']))
    try:
        if not validate(urlopen(SITE).read()):
            raise Exception('Validation failed')
    except Exception as e:
        print('Check failed!')
        print(e)
    else:
        print('Check passed!')
    finally:
        print('Check complete at {}'.format(str(datetime.now())))
        return "Finish"

Unable to import module 'lambda_function': No module named requests

Since requests is an external module, this error occurs.

Use of external modules

I had a habit of putting non-standard Python modules like requests, so I will write it.

lambda-uploader This area will be helpful.

Develop, run, and deploy AWS Lambda remotely using lambda-uploader Deploy AWS Lambda Python with lambda-uploader

ZIP upload your code

I did it with this. You can do it as soon as it is a single shot. [AWS] What to do when you want to pip with Lambda

It's like pip install requests -t . in your working folder and zip it up.

If you can upload successfully, start from the previous current code .py.

Slack

import

api

You can use any combination, but this time we will use a simple requests * webhook.

Click here to get the URL of the Webhook. Slack Webhook URL acquisition procedure

def send_slack(text):
    url = "The URL that was just right"
    payload_dic = {
        "text":text,
        "icon_emoji":':grin:',
        "channel":'bot-test2',
    }

    r = requests.post(url, data=json.dumps(payload_dic))

You can send Slack with just this!

# coding: utf-8
from __future__ import print_function

import os
import json
import requests
from datetime import datetime
from urllib2 import urlopen

SITE = os.environ['site']
EXPECTED = os.environ['expected']

def validate(res):
    return EXPECTED in res

def web_check():
    try:
        if not validate(urlopen(SITE).read()):
            raise Exception('Validation failed')
    except Exception as e:
        print('Check failed!')
        print(e)
    else:
        print('Check passed!')
    finally:
        print('Check complete at {}'.format(str(datetime.now())))

def lambda_handler(event, context):
    print('Checking {} at {}...'.format(SITE, event['time']))
    # web_check()
    send_slack("test")
    return "success!"

def send_slack(text):
    url = "This is my URL"
    payload_dic = {
        "text":text,
        "icon_emoji":':grin:',
        "channel":'bot-test2',
    }

    r = requests.post(url, data=json.dumps(payload_dic))


I moved the process to def web_check ():.

def lambda_handler(event, context):
    print('Checking {} at {}...'.format(SITE, event['time']))
    # web_check()
    send_slack("test")
    return "success!"

with this

スクリーンショット 2017-03-15 16.55.31.png Slack has arrived.

Okay。 Next, we will fetch the data from DynamoDB.

DynamoDB

Fee structure

DynamoDB Free Tier as part of AWS Free Tier

storage Write capacity Write capacity
25GB 25 25

One capacity unit handles one request per second Write capacity of 1 unit: Writes up to 1KB data once per second Read capacity of 1 unit: Up to 4KB data can be read once per second 2.5 million read requests from DynamoDB streams are available for free.

For the time being, if you set the write capacity and read capacity to 1 when creating a database, I think that it will not cost you money if it is a free frame. This time, it can be about once a minute, so select the minimum unit of 1 (a level that can be accessed once a second).

Try to make

For the time being, the settings are like this.

スクリーンショット 2017-03-15 17.10.25.png

DynamoDB can only be accessed with a hash (dictionary type, associative array, etc. depending on the language) Key. This time, access with id as key.

スクリーンショット 2017-03-15 17.15.28.png


{
  "id": 1,
  "target": "Online Shopping",
  "url": "https://www.amazon.com/"
}

{
  "id": 2,
  "target": "Gmail",
  "url": "https://www.google.co.jp/"
}

It is used as a Table to find the target from the url.

Take the previous data from the python code

#Add two
import boto3
from boto3.dynamodb.conditions import Key, Attr

def get_urls():
    table    = dynamodb.Table('sites')
    response = table.scan()
    sites = response["Items"]
    return sites

I added a function and searched for the target sentence from the url fetched from DynamoDB with get_urls ().

Current.py


# coding: utf-8
from __future__ import print_function

import os
import json
import requests
from datetime import datetime
from urllib2 import urlopen
import boto3
from boto3.dynamodb.conditions import Key, Attr

dynamodb = boto3.resource('dynamodb')

def validate(res, target):
    return target in res

def web_check(url, target):
    print("Serching ... " + target)
    try:
        if validate(urlopen(url).read(), target):
            print("Find!")
        else:
            print("Not Find!")
    except Exception as e:
        print('Error')
        print(e)

def get_urls():
    table    = dynamodb.Table('sites')
    response = table.scan()
    sites = response["Items"]
    return sites

def send_slack(text):
    url = "https://hooks.slack.com/services/"
    payload_dic = {
        "text":text,
        "icon_emoji":':grin:',
        "channel":'bot-test2',
    }
    r = requests.post(url, data=json.dumps(payload_dic))

def lambda_handler(event, context):
    print('Check start')

    sites = get_urls()
    for site in sites:
        web_check(str(site["url"]), str(site["target"]))
    
    return "success!"


Output result


Check start
Serching ...Technical information sharing service at https://qiita.com/
Find!
Serching ... Gmail at https://www.google.co.jp/
Find!
Serching ...Is it Gmail? at https://www.google.co.jp/
Not Find!
Serching ... Online Shopping at https://www.amazon.com/
Error
HTTP Error 503: Service Unavailable
END RequestId: 3992d81e-095e-11e7-b30a-1ddc7da9e992

I got an error here as well. Python stumbles!


'utf8' codec can't decode byte 0x90 in position 102: invalid start byte

Knowing Python's UnicodeEncodeError I solved the error by referring to here. When type (site [" url "]) was done, it was <type'unicode'>, so I did str (site ["url "]) and changed it to <type'str'>.

Write DynamoDB from Lambda

sites_check1 Add table

スクリーンショット 2017-03-15 18.19.12.png

add to

from datetime import datetime, timedelta

def insert(results):
    date = datetime.now() + timedelta(hours=9)

    id = 0
    table = dynamodb.Table('sites_check')
    table.put_item(
        Item={
            "id": id,
            "date": date.strftime("%Y/%m/%d %H:%M"),
            "result": results
       }
    )

Added timedelta to increase or decrease the time. Added title to the sites table in DynamoDB. Current code


# coding: utf-8
from __future__ import print_function

import os
import json
import requests
from datetime import datetime, timedelta
from urllib2 import urlopen
import boto3
from boto3.dynamodb.conditions import Key, Attr

dynamodb = boto3.resource('dynamodb')

def validate(res, target):
    return target in res

def web_check(url, target):
    print("Serching ... " + target + " at " + url)
    try:
        if validate(urlopen(url).read(), target):
            return "Find!"
        else:
            return "Not Find!"
    except Exception as e:
        return str(e)

def get_urls():
    table    = dynamodb.Table('sites')
    response = table.scan()
    sites = response["Items"]
    return sites

def send_slack(text):
    url = "https://hooks.slack.com/"
    payload_dic = {
        "text":text,
        "icon_emoji":':grin:',
        "channel":'bot-test2',
    }
    r = requests.post(url, data=json.dumps(payload_dic))

def insert(results):
    date = datetime.now() + timedelta(hours=9)

    id = 0
    table = dynamodb.Table('sites_check')
    table.put_item(
        Item={
            "id": id,
            "date": date.strftime("%Y/%m/%d %H:%M"),
            "result": results
       }
    )

def lambda_handler(event, context):
    print('Check start')

    results = {}
    sites = get_urls()
    for site in sites:
        msg = web_check(str(site["url"]), str(site["target"]))
        results[str(site["title"])] = msg
    
    insert(results)
    return "success!"



Here is the result of inserting the data.


{
  "date": "2017/03/15 18:37",
  "id": 0,
  "result": {
    "Amazon": "Find!", #For some reason it becomes Find and no error occurs
    "Google": "Find!",
    "Google-2": "Not Find!",
    "Qiita": "Find!"
  }
}

{
  "date": "2017/03/15 18:48",
  "id": 0,
  "result": {
    "Amazon": "HTTP Error 503: Service Unavailable", #Str when an error occurs here(e)I made it
    "Google": "Find!",
    "Google-2": "Not Find!",
    "Qiita": "Find!"
  }
}

If you don't do str (e), an error will occur because e is not str type. I'm used to Python, so it takes about 10 minutes to solve it. I can't comment on json.

def get_recent_codes():
    date = datetime.now() + timedelta(hours=9)
    now = date.strftime("%Y/%m/%d %H:%M")
    last = (date + timedelta(minutes=-9)).strftime("%Y/%m/%d %H:%M")

    #Query with id 0 and fetching data within 10 minutes
    response = table.query(
        KeyConditionExpression=Key('id').eq(0) & Key('date').between(last, now)
    )

    return response

The number of queries that have been fetched is entered in response ['Count']. The data of the table taken in response ['Items'] is entered. If you need other data, please take out the data while printing as appropriate.

The result looks like this

# coding: utf-8
from __future__ import print_function

import os
import json
import requests
from datetime import datetime, timedelta
from urllib2 import urlopen
import boto3
from boto3.dynamodb.conditions import Key, Attr

dynamodb = boto3.resource('dynamodb')

def validate(res, target):
    return target in res

def web_check(url, target):
    print("Serching ... " + target + " at " + url)
    try:
        if validate(urlopen(url).read(), target):
            return "Find!"
        else:
            return "Not Find!"
    except Exception as e:
        return str(e)

def get_urls():
    table    = dynamodb.Table('sites')
    response = table.scan()
    sites = response["Items"]
    return sites

def send_slack(text):
    url = "https://hooks.slack.com/"
    payload_dic = {
        "text":text,
        "icon_emoji":':grin:',
        "channel":'bot-test2',
    }
    r = requests.post(url, data=json.dumps(payload_dic))

def insert(results):
    table = dynamodb.Table('sites_check')
    date = datetime.now() + timedelta(hours=9)
    id = 0
    table.put_item(
        Item={
            "id": id,
            "date": date.strftime("%Y/%m/%d %H:%M"),
            "result": results
       }
    )

def get_recent_codes():
    table = dynamodb.Table('sites_check')

    date = datetime.now() + timedelta(hours=9)
    now = date.strftime("%Y/%m/%d %H:%M")
    last = (date + timedelta(minutes=-9)).strftime("%Y/%m/%d %H:%M")
    print(last + "From" + now + "Check up to")

    response = table.query(
        KeyConditionExpression=Key('id').eq(0) & Key('date').between(last, now)
    )
    
    return response

def lambda_handler(event, context):
    print('Check start')

    results = {}
    sites = get_urls()
    for site in sites:
        msg = web_check(str(site["url"]), str(site["target"]))
        results[str(site["title"])] = msg
    
    insert(results)
    print(get_recent_codes())
    return "success!"



It seems that you can make it run regularly and do various things.

at the end

I'm an amateur of Python and Lambda, so I made a lot of errors. However, I think that architects using Python and Lambda will be quite useful, so I would like to continue using them.

Recommended Posts

[AWS] A story that may be helpful for those who are new to Lambda-Python and DynamoDB
Summary of sites and learning procedures that will be helpful for those who are trying to make games with pygame for the first time
ABC's A problem analysis for the past 15 times to send to those who are new to Python
Tips for those who are wondering how to use is and == in Python
A collection of resources that may be useful for creating and expanding dotfiles
For those who are new to programming but have decided to analyze data with Python
It may be a problem to use Japanese for folder names and notebook names in Databricks
"The Cathedral and the Bazaar" that only those who work in a solid company want to read
A story that I was addicted to calling Lambda from AWS Lambda.
PyPI registration steps for those who want to make a PyPI debut
Python environment construction 2016 for those who aim to be data scientists
[YOLO v5] Object detection for people who are masked and those who are not
AWS ~ For those who will use it ~
A memo for making a figure that can be posted to a journal with matplotlib
Join Azure Using Go ~ For those who want to start and know Azure with Go ~
For those who want to learn Excel VBA and get started with Python
[Solved] I have a question for those who are familiar with Python mechanize.
A story about cross-compiling a python package for AWS Lambda and deploying it serverless
A magic word (?) That may save people who are addicted to building using the Intel compiler of Python + Numpy.