[PYTHON] Try running the Embulk command with your Lambda function

Introduction

This article is a 12/16 article of BeeX Advent Calendar 2020.

==

In the previous article I tried running it on a Lambda container image.

This time, in addition to that, I tried to use the Embulk command that I often use recently.

environment

PC:Windows 10 Docker:Docker version 19.03.13, build 4484c46d9d

Premise

Actual operation

Creating a lambda function

Create a Lambda Function by referring to @ shiro01's article. This time, we will confirm the execution of Embulk, so let's simply execute the help command.

lambda_function.py


import subprocess

def lambda_handler(event, context):
    cmd = ['/usr/bin/embulk','help']
    out = subprocess.run(cmd,shell=True , stdout=subprocess.PIPE)
    print(out.stdout.decode())

One caveat is that "shell = Treu" is added to the argument of subprocess.run. If you do not add this, you will get an OS Error.

reference

Creating a Dockerfile

Next, create a Dockerfile. Embulk is installed in advance and an executable file is created. Copy Embulk and lambda_function to the container with COPY of Dockerfile.

FROM amazon/aws-lambda-python:3.7

COPY lambda_function.py ./
COPY embulk /usr/bin

RUN chmod +x /usr/bin/embulk

CMD [ "lambda_function.lambda_handler" ]

Build the image

Build the image with the following command.

$ cd [DockerFile storage destination DIR]
$ docker build -t lambda_embulk .

The image is created.

$ docker images
REPOSITORY    TAG       IMAGE ID       CREATED              SIZE
lambda_embulk latest    472005da7cf7   About a minute ago   980MB

Local test

You can run Lambda locally with the following command.

$ docker run -p 9000:8080 lambda_embulk:latest
time="2020-12-15T06:11:57.95" level=info msg="exec '/var/runtime/bootstrap' (cwd=/var/task, handler=)"

Open another window and run the following command. Null is returned as the response of the curl command.

$ curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d''{}'
null

If you check the window running the container, the result of the execution is displayed on the console. The local test looks fine as the "embulk help" command is running.

$ docker run -p 9000:8080 lambda_embulk:latest
time="2020-12-15T06:13:53.604" level=info msg="exec '/var/runtime/bootstrap' (cwd=/var/task, handler=)"
START RequestId: 0ed577b3-6a07-447a-bccb-c03d255c0201 Version: $LATEST
time="2020-12-15T06:13:57.601" level=info msg="extensionsDisabledByLayer(/opt/disable-extensions-jwigqn8j) -> stat /opt/disable-extensions-jwigqn8j: no such file or directory"
time="2020-12-15T06:13:57.601" level=warning msg="Cannot list external agents" error="open /opt/extensions: no such file or directory"
Embulk v0.9.23
Usage: embulk [-vm-options] <command> [--options]
Commands:
   mkbundle   <directory>                             # create a new plugin bundle environment.
   bundle     [directory]                             # update a plugin bundle environment.
   run        <config.yml>                            # run a bulk load transaction.
   cleanup    <config.yml>                            # cleanup resume state.
   preview    <config.yml>                            # dry-run the bulk load without output and show preview.
   guess      <partial-config.yml> -o <output.yml>    # guess missing parameters to create a complete configuration file.
   gem        <install | list | help>                 # install a plugin or show installed plugins.
   new        <category> <name>                       # generates new plugin template
   migrate    <path>                                  # modify plugin code to use the latest Embulk plugin API
   example    [path]                                  # creates an example config file and csv file to try embulk.
   selfupdate [version]                               # upgrades embulk to the latest released version or to the specified version.

VM options:
   -E...                            Run an external script to configure environment variables in JVM
                                    (Operations not just setting envs are not recommended nor guaranteed.
                                     Expect side effects by running your external script at your own risk.)
   -J-O                             Disable JVM optimizations to speed up startup time (enabled by default if command is 'run')
   -J+O                             Enable JVM optimizations to speed up throughput
   -J...                            Set JVM options (use -J-help to see available options)
   -R--dev                          Set JRuby to be in development mode

Use `<command> --help` to see description of the commands.

END RequestId: 0ed577b3-6a07-447a-bccb-c03d255c0201
REPORT RequestId: 0ed577b3-6a07-447a-bccb-c03d255c0201  Init Duration: 1.03 ms  Duration: 591.61 ms     Billed Duration: 600 mMemory Size: 3008 MB     Max Memory Used: 3008 MB

Push to ECR repository

Create a repository from the AWS console. image.png

Push the image to the ECR repository created by the following command. This area is almost the same as the previous article. Embulk is a little sized, so it will take longer to push than last time.

$ docker tag lambda_embulk:latest XXXXXXXXXXXXX.dkr.ecr.ap-northeast-1.amazonaws.com/lambda-embulk
$ docker images
REPOSITORY                                                                     TAG       IMAGE ID       CREATED          SIZE
941996685139.dkr.ecr.ap-northeast-1.amazonaws.com/lambda-embulk                latest    472005da7cf7   11 minutes ago   980MB
$ aws ecr get-login-password --region ap-northeast-1 | docker login --username AWS --password-stdin XXXXXXXXXXXXX.dkr.ecr.ap-northeast-1.amazonaws.com
Login Succeeded
$ docker push XXXXXXXXXXXXX.dkr.ecr.ap-northeast-1.amazonaws.com/lambda-embulk:latest
The push refers to repository [XXXXXXXXXXXXX.dkr.ecr.ap-northeast-1.amazonaws.com/lambda-embulk]
5f70bf18a086: Pushed
65f9fe7cdd01: Pushed
1965e83122e7: Pushed
701bdcbf3b47: Pushed
6e660533f001: Pushed
069cd8bd11dd: Pushed
6e191121f7ea: Pushed
d6fa53d6caa6: Pushed
1fb474cee41c: Pushed
b1754cf6954d: Pushed
464c816a7003: Pushed
latest: digest: sha256:XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX size: 2624

I was able to upload successfully. image.png

Lambda creation / execution

Create a Lambda function from the AWS console. image.png

The Lambda function has been created. image.png

Confirmation of execution result

I got an error with timeout. image.png

Since the timeout value was 3 seconds in the basic setting of Lambda, change it to 3 minutes for the time being. image.png

It took a while, but it ended normally. Looking at the output, it seems that Embulk's help command can also be executed. image.png

at the end

Whether this is good or not, I was able to successfully run Embulk on Lambda. Perhaps there are various things to consider such as Lambda specifications and where to hold the diff file for actual operation, but for the time being, it may be possible to put it in s3 and run small-scale processing from there. I don't know.

However, since the processing of Embulk does not often fit in the execution time of Lambda, that area may be difficult.

Recommended Posts

Try running the Embulk command with your Lambda function
Try running SlackBot made with Ruby x Sinatra on AWS Lambda
Try running cloudera manager with docker
Create a jar file with the command
Try running Word2vec model on AWS Lambda
Try running MySql and Blazor with docker-compose
Try running Slack's (Classic) Bot with docker
Try shaking your hands with ARKit + Metal
Try to implement login function with Spring-Boot
Try using the Wii remote with Java
Try running MPLS-VPN with FR Routing on Docker
Customize the output with Wagby's CSV download function
Try running OSPF with FR Routing on Docker
Try to implement login function with Spring Boot
Use your own docker-compose.yml on the command line
Try to summarize the common layout with rails