[PYTHON] From white mask to monster mask, development description of serverless LINE photo processing application on AWS

0. Introduction

Nice to meet you, I'm Pong from China. I am a new engineer working at Nomura Research Institute. I'm still not good at Japanese, so please forgive me if you get strange Japanese. Thank you very much.

There are too many white masks for travel photos during the Corona period, You can't stand it anymore, right? Right now, there is new employee development training, and I made this a training issue. To solve this, we have developed an application that converts the white mask in the photo into a monster mask. Based on the concept of "reducing the amount of work as much as possible" It was developed as a serverless LINE photo processing application by utilizing various AWS services. ** For those who are tired of masked photos ** and ** for those who are interested in serverless ** Please enjoy this development report.

1. 1. Why develop this app?

I traveled with her to Choshi in Chiba during the summer vacation. I played in the sea, climbed the lighthouse and took a lot of commemorative photos. But unfortunately, the main character in the photo was not a human or a landscape, but a white mask. Photographs from the Corona era (time) have the highest appearance rate of white masks and appear everywhere. When she saw this photo, she complained, "I don't want to see the white mask anymore." So what if I convert the white mask in the photo to something else?

Just as I and she love superhero movies, I love the monster mask (e.g. Batman's monster Bane) in it. Wouldn't it be nice if the white mask became a monster mask?

※This work is a derivative of "[Bane](https://www.flickr.com/photos/istolethetv/30216006787/)"by[istolethetv](https://www.flickr.com/people/istolethetv/),usedunder[CCBY2.0](https://creativecommons.org/licenses/by/2.0/)

That's why I came up with the idea and decided to develop this photo processing app. But there are three problems in front of me.

First of all, there are various options for the application form. A web app as a web page? An ios or android app for smartphones only? You have to design the front-end interface as well as the back-end processing. Considering various things, I think the LINE app is the most appropriate. There are three reasons:

  1. Easy to use: Almost everyone has LINE, and the LINE app (bot) is very easy to send and receive, and anyone can use it.
  2. Less work: For web apps and ios apps, it's annoying to be honest, including the design of the interface. But with the LINE app, you don't have to think about them and it will be easier.
  3. Easy to share: The characteristic of SNS is that it is easy to share. Not only the converted photos, but also this app will be easier to share.

Therefore, ** I decided to use the LINE application as the application form! ** **

And the next challenge is where to set up the server Will it be built on a physical machine such as a Raspberry pi? Do you use a cloud server such as AWS EC2? In addition, the server needs not only construction but also maintenance management later. As a lazy person who has the idea of "reducing the amount of work as much as possible", I don't want to do that. .. .. Then, why not develop without a server without needing a server? When I investigated, ** With AWS API Gateway and Lambda, I could realize serverless, and there was no server construction and maintenance management **! Alright, I decided on you! !!

Finally, this time we need a face recognition AI to process the face photo. As a result, questions such as "what AI model structure do you use?", "Where do you get the training data?" And "what kind of label do you want to label the data?" I thought, "I wish I had a face recognition AI that I could use right away," so I looked it up on AWS and the results really came out! There is an AWS service that analyzes images or videos called Rekognition (not recognition). ** You don't need to make AI, just call Rekognition and you can recognize and analyze the face in the photo **. With this, you can achieve "reduce the amount of work as much as possible".

With this in mind, we decided to develop a serverless LINE photo processing app on AWS!

2. System overview

We have already decided on the application form, so let's build the system from now on! The overall picture of the system created this time is as follows:

Here, it is assumed that the communication with the user is a smartphone. (PC version LINE is also available) The front end is a LINE Bot. All backends are processed by AWS Cloud. To achieve serverless processing, processing is executed on three Lambdas: "controller", "face recognition", and "new image generation". Considering the processing flow, this system can be divided into 5 parts as shown in the figure below:

Then, I will explain these five parts from the flow of processing.

3. 3. Explanation for each part

3-1 Image input part

Process flow

The first part is the input part. The function is literally to load the image that the user sent to the LINE Bot. The entities related to this part are "LINE Bot", "API Gateway" and "Controller Lambda". The processing flow is as follows:

First, the user sends the photo image to the LINE Bot. The LINE Bot then wraps the image in line_event and sends it to the API Gateway. API Gateway sends the event to the controller Lambda without any changes.

Create LINE Bot

To make this part, first create a LINE Bot (messagingApi) as a front door. Click here for how to make: LINE Official Document: Get Started with Messaging API After creating the channel, there are still two settings required. The first is to issue a "channel access token" for authentication with Lambda. The second is to turn off the response function of the messaging api and turn on the webhook function. Do not enter the webhook URL now, but after you have configured the API Gateway.

Create controller Lambda

Next, create an IAM role that runs a service such as Lambda. Enter the IAM service from the dashboard and create a new role. The new IAM role names serverless-linebot etc. and the service used is Lambda. The policies are "Amazon S3FullAccess", "AmazonRekognitionFullAccess", and "CloudWatchLogsFullAccess". Also, since the controller Lambda calls another Lambda, add the following policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "lambda:InvokeFunction",
                "lambda:InvokeAsync"
            ],
            "Resource": [
                "Face recognition Lambda arn",
                "New image generation Lambda arn"
            ]
        }
    ]
}

"Face recognition Lambda arn" and "New image generation Lambda arn" are not yet available, so don't forget to rewrite them after creating the Lambda function. All this processing is executed in this role.

Create controller Lambda function

Since API Gateway is a "connection", we have to create a LINE Bot at both ends and a controller Lambda function before creating it, so next we will create a controller Lambda function. Since python is used for function creation this time, select python3.x (3.6 ~ 3.8) as the runtime. The IAM role you want to run is the one you just created.

After creating it, first set the memory to 512MB and the timeout to 1min in "Basic settings". Then set the following environment variables:

Key value
LINE_CHANNEL_ACCESS_TOKEN LINE Bot Channel Access Token
LINE_CHANNEL_SECRET LINE Bot Channel Secret

Regarding the contents of the Lambda function, the controller Lambda communicates with the LINE Bot, so the "line-bot-sdk" package is required. To install on Lambda, first install line-bot-sdk locally in a new folder using the following command:

python -m pip install line-bot-sdk -t <new_folder>

After that, create a lambda_function.py (Lambda recognizes this as the main function with this name, so be sure to name it) file in the same folder and enter the following code:

lambda_function_for_controller.py


import os
import sys
import logging

import boto3
import json

from linebot import LineBotApi, WebhookHandler
from linebot.models import MessageEvent, TextMessage, TextSendMessage, ImageMessage, ImageSendMessage
from linebot.exceptions import LineBotApiError, InvalidSignatureError


logger = logging.getLogger()
logger.setLevel(logging.ERROR)

#Read the line bot channel access token and secret from the environment variables
channel_secret = os.getenv('LINE_CHANNEL_SECRET', None)
channel_access_token = os.getenv('LINE_CHANNEL_ACCESS_TOKEN', None)
if channel_secret is None:
    logger.error('Specify LINE_CHANNEL_SECRET as environment variable.')
    sys.exit(1)
if channel_access_token is None:
    logger.error('Specify LINE_CHANNEL_ACCESS_TOKEN as environment variable.')
    sys.exit(1)

# api&Generate handler
line_bot_api = LineBotApi(channel_access_token)
handler = WebhookHandler(channel_secret)

#Connect with S3 bucket
s3 = boto3.client("s3")
bucket = "<S3 bucket name>"

#Lambda main function
def lambda_handler(event, context):

    #X for authentication-Line-Signature header
    signature = event["headers"]["X-Line-Signature"]

    body = event["body"]

    #Setting the return value
    ok_json = {"isBase64Encoded": False,
               "statusCode": 200,
               "headers": {},
               "body": ""}
    error_json = {"isBase64Encoded": False,
                  "statusCode": 403,
                  "headers": {},
                  "body": "Error"}

    @handler.add(MessageEvent, message=ImageMessage)
    def message(line_event):

        #User profile
        profile = line_bot_api.get_profile(line_event.source.user_id)

        #Extract the ID of the user who sent(push_Use if message,Not necessary for reply)
        # user_id = profile.user_id

        #Extract message ID
        message_id = line_event.message.id

        #Extract image file
        message_content = line_bot_api.get_message_content(message_id)
        content = bytes()
        for chunk in message_content.iter_content():
            content += chunk

        #Save image file
        key = "origin_photo/" + message_id
        new_key = message_id[-3:]
        s3.put_object(Bucket=bucket, Key=key, Body=content)

        #Call face recognition lambda
        lambdaRekognitionName = "<Here is arn of face recognition lambda>"
        params = {"Bucket": bucket, "Key": key}  #Image file path information
        payload = json.dumps(params)
        response = boto3.client("lambda").invoke(
            FunctionName=lambdaRekognitionName, InvocationType="RequestResponse", Payload=payload)
        response = json.load(response["Payload"])

        #Call new image generation lambda
        lambdaNewMaskName = "<Here is arn of new image generation lambda>"
        params = {"landmarks": str(response),
                  "bucket": bucket,
                  "photo_key": key,
                  "new_photo_key": new_key}
        payload = json.dumps(params)
        boto3.client("lambda").invoke(FunctionName=lambdaNewMaskName,
                                      InvocationType="RequestResponse", Payload=payload)

        #Signed URL generation
        presigned_url = s3.generate_presigned_url(ClientMethod="get_object", Params={
                                                  "Bucket": bucket, "Key": new_key}, ExpiresIn=600)

        #Replying to new image message
        line_bot_api.reply_message(line_event.reply_token, ImageSendMessage(
            original_content_url=presigned_url, preview_image_url=presigned_url))

    try:
        handler.handle(body, signature)
    except LineBotApiError as e:
        logger.error("Got exception from LINE Messaging API: %s\n" % e.message)
        for m in e.error.details:
            logger.error("  %s: %s" % (m.property, m.message))
        return error_json
    except InvalidSignatureError:
        return error_json

    return ok_json

Above is the entire controller Lambda function, which is associated with all five parts. The part about this first part is:

lambda_function_for_controller.py


#Read the line bot channel access token and secret from the environment variables
channel_secret = os.getenv('LINE_CHANNEL_SECRET', None)
channel_access_token = os.getenv('LINE_CHANNEL_ACCESS_TOKEN', None)
if channel_secret is None:
    logger.error('Specify LINE_CHANNEL_SECRET as environment variable.')
    sys.exit(1)
if channel_access_token is None:
    logger.error('Specify LINE_CHANNEL_ACCESS_TOKEN as environment variable.')
    sys.exit(1)

# api&Generate handler
line_bot_api = LineBotApi(channel_access_token)
handler = WebhookHandler(channel_secret)

lambda_function_for_controller.py


    #X for authentication-Line-Signature header
    signature = event["headers"]["X-Line-Signature"]

    body = event["body"]

You have now authenticated your LINE Bot and received the event details. After that, zip the contents of that folder and Upload by "Function code"-> "Action"-> "Upload .zip file" of Lambda.

Create API Gateway

The last is the creation of API Gateway as a connection. The type of API Gateway created here is REST API. After creating the API, create the resources and methods. The method is POST, the integration type is a Lambda function, and you also enable the use of Lambda proxy integration. The Lambda function selects the controller Lambda function.

Also, about the setting of POST method request First, select "Verify query string parameters and headers" to authenticate the request. And add the following header to the HTTP request header:

name Mandatory cache
X-Line-Signature

Once set, let's deploy. After the deployment is complete, copy the method call URL on stage and Paste it into the LINE Bot webhook URL. This completes the first part.

3-2 Image storage part

Process flow

The second part is the image storage part. This part is very easy, just save the image loaded by the controller Lambda to the S3 bucket. The processing flow is as follows:

Create an S3 bucket

First, create an S3 bucket for your work. In this project, if the bucket name is too long, a "signed URL length problem" will occur (see [3-5](#signed url) for details). Make the bucket name as short as possible (4 English characters in my case). Also, you don't want others to see your picture, right? To protect your privacy Check "Block all public access" in the permission settings to create a bucket. After creating, a folder called "origin_photo" to save the photo uploaded by the user, Create a folder called "masks" to save the mask images. This completes the work on the S3 side.

Controller Lambda function

The controller Lambda function was filled in in Part 1, so there's nothing special to do here. Just explain the code for this part and the content is:

lambda_function_for_controller.py


#Connect with S3 bucket
s3 = boto3.client("s3")
bucket = "<S3 bucket name>"

lambda_function_for_controller.py


        #Extract message ID
        message_id = line_event.message.id

        #Extract image file
        message_content = line_bot_api.get_message_content(message_id)
        content = bytes()
        for chunk in message_content.iter_content():
            content += chunk

        #Save image file
        key = "origin_photo/" + message_id
        new_key = message_id[-3:]
        s3.put_object(Bucket=bucket, Key=key, Body=content)

Here, rename the image file with the LINE message ID, Multiple users will be able to distinguish.

3-3 Face recognition part

The third part is the recognition of saved photos. Specifically, it recognizes the contour of the face and the positions of the eyes and nose, and uses it for combining with the mask image later. With the concept of "reducing the amount of work as much as possible" I don't want to train face recognition AI from scratch myself Faces are recognized using a service called "Rekognition" on AWS.

What is Rekognition?

Rekognition is a service that "automates image and video analysis using machine learning." Simply put, it feels like "use the trained AI as it is". Here's an introduction to Rekognition: Amazon Rekognition

Rekognition has various functions such as object and scene detection and face comparison, and can process not only images but also videos. This time, we will use the "face-detection" function to obtain the position of the face. The location information you want to get is called a "landmark". The figure below is an image of a landmark:

Analysis result of this figure:

Rekognition recognition result
{
    "FaceDetails": [
        {
            "AgeRange": {
                "High": 43,
                "Low": 26
            },
            "Beard": {
                "Confidence": 97.48941802978516,
                "Value": true
            },
            "BoundingBox": {
                "Height": 0.6968063116073608,
                "Left": 0.26937249302864075,
                "Top": 0.11424895375967026,
                "Width": 0.42325547337532043
            },
            "Confidence": 99.99995422363281,
            "Emotions": [
                {
                    "Confidence": 0.042965151369571686,
                    "Type": "DISGUSTED"
                },
                {
                    "Confidence": 0.002022328320890665,
                    "Type": "HAPPY"
                },
                {
                    "Confidence": 0.4482877850532532,
                    "Type": "SURPRISED"
                },
                {
                    "Confidence": 0.007082826923578978,
                    "Type": "ANGRY"
                },
                {
                    "Confidence": 0,
                    "Type": "CONFUSED"
                },
                {
                    "Confidence": 99.47616577148438,
                    "Type": "CALM"
                },
                {
                    "Confidence": 0.017732391133904457,
                    "Type": "SAD"
                }
            ],
            "Eyeglasses": {
                "Confidence": 99.42405700683594,
                "Value": false
            },
            "EyesOpen": {
                "Confidence": 99.99604797363281,
                "Value": true
            },
            "Gender": {
                "Confidence": 99.722412109375,
                "Value": "Male"
            },
            "Landmarks": [
                {
                    "Type": "eyeLeft",
                    "X": 0.38549351692199707,
                    "Y": 0.3959200084209442
                },
                {
                    "Type": "eyeRight",
                    "X": 0.5773905515670776,
                    "Y": 0.394561767578125
                },
                {
                    "Type": "mouthLeft",
                    "X": 0.40410104393959045,
                    "Y": 0.6479480862617493
                },
                {
                    "Type": "mouthRight",
                    "X": 0.5623446702957153,
                    "Y": 0.647117555141449
                },
                {
                    "Type": "nose",
                    "X": 0.47763553261756897,
                    "Y": 0.5337067246437073
                },
                {
                    "Type": "leftEyeBrowLeft",
                    "X": 0.3114689588546753,
                    "Y": 0.3376390337944031
                },
                {
                    "Type": "leftEyeBrowRight",
                    "X": 0.4224424660205841,
                    "Y": 0.3232649564743042
                },
                {
                    "Type": "leftEyeBrowUp",
                    "X": 0.36654090881347656,
                    "Y": 0.3104579746723175
                },
                {
                    "Type": "rightEyeBrowLeft",
                    "X": 0.5353175401687622,
                    "Y": 0.3223199248313904
                },
                {
                    "Type": "rightEyeBrowRight",
                    "X": 0.6546239852905273,
                    "Y": 0.3348073363304138
                },
                {
                    "Type": "rightEyeBrowUp",
                    "X": 0.5936762094497681,
                    "Y": 0.3080498278141022
                },
                {
                    "Type": "leftEyeLeft",
                    "X": 0.3524211347103119,
                    "Y": 0.3936865031719208
                },
                {
                    "Type": "leftEyeRight",
                    "X": 0.4229775369167328,
                    "Y": 0.3973258435726166
                },
                {
                    "Type": "leftEyeUp",
                    "X": 0.38467878103256226,
                    "Y": 0.3836822807788849
                },
                {
                    "Type": "leftEyeDown",
                    "X": 0.38629674911499023,
                    "Y": 0.40618783235549927
                },
                {
                    "Type": "rightEyeLeft",
                    "X": 0.5374732613563538,
                    "Y": 0.39637991786003113
                },
                {
                    "Type": "rightEyeRight",
                    "X": 0.609208345413208,
                    "Y": 0.391626238822937
                },
                {
                    "Type": "rightEyeUp",
                    "X": 0.5750962495803833,
                    "Y": 0.3821527063846588
                },
                {
                    "Type": "rightEyeDown",
                    "X": 0.5740782618522644,
                    "Y": 0.40471214056015015
                },
                {
                    "Type": "noseLeft",
                    "X": 0.4441811740398407,
                    "Y": 0.5608476400375366
                },
                {
                    "Type": "noseRight",
                    "X": 0.5155643820762634,
                    "Y": 0.5569332242012024
                },
                {
                    "Type": "mouthUp",
                    "X": 0.47968366742134094,
                    "Y": 0.6176465749740601
                },
                {
                    "Type": "mouthDown",
                    "X": 0.4807897210121155,
                    "Y": 0.690782368183136
                },
                {
                    "Type": "leftPupil",
                    "X": 0.38549351692199707,
                    "Y": 0.3959200084209442
                },
                {
                    "Type": "rightPupil",
                    "X": 0.5773905515670776,
                    "Y": 0.394561767578125
                },
                {
                    "Type": "upperJawlineLeft",
                    "X": 0.27245330810546875,
                    "Y": 0.3902156949043274
                },
                {
                    "Type": "midJawlineLeft",
                    "X": 0.31561678647994995,
                    "Y": 0.6596118807792664
                },
                {
                    "Type": "chinBottom",
                    "X": 0.48385748267173767,
                    "Y": 0.8160444498062134
                },
                {
                    "Type": "midJawlineRight",
                    "X": 0.6625112891197205,
                    "Y": 0.656606137752533
                },
                {
                    "Type": "upperJawlineRight",
                    "X": 0.7042999863624573,
                    "Y": 0.3863988518714905
                }
            ],
            "MouthOpen": {
                "Confidence": 99.83820343017578,
                "Value": false
            },
            "Mustache": {
                "Confidence": 72.20288848876953,
                "Value": false
            },
            "Pose": {
                "Pitch": -4.970901966094971,
                "Roll": -1.4911699295043945,
                "Yaw": -10.983647346496582
            },
            "Quality": {
                "Brightness": 73.81391906738281,
                "Sharpness": 86.86019134521484
            },
            "Smile": {
                "Confidence": 99.93638610839844,
                "Value": false
            },
            "Sunglasses": {
                "Confidence": 99.81478881835938,
                "Value": false
            }
        }
    ]
}

What I want to get this time is the "landmarks" item in this. "Type" is the name of the point (see image above). However, x and y are not the coordinates of specific pixel points. Shows the ratio to the width of the image.

Process flow

The processing flow of the third part is as follows: Rekognition has two mechanisms to read images. The first is to load using an S3 bucket or an image URL on the Internet. The second is to send the file and read it directly. This time we will use the first URL method. Therefore, it is not the image that is passed from the controller Lambda to the face recognition Lambda, but the storage location information of the file. The same goes for face recognition Lambda to pass to Rekognition.

Here, the IAM role that executes face recognition Lambda is the role created in the first part. I have the authority to use S3 and Rekognition, so Even if the S3 bucket is private, it's okay for Rekognition to read the images in it.

And the result returned from Rekognition seems to be an example of the above result. There are various things such as "age" and "gender" in it, I only want to use "landmarks" this time. Therefore, face recognition Lambda extracts landmarks from the results.

Also, there are many landmarks, There are some points (mouth, etc.) that cannot be recognized well due to the mask, and there are some extra points (eyes, etc.) that are too fine. So, here we just extract the following 5 landmarks and return them to the controller Lambda.

Landmark name position
eyeLeft left eye
eyeRight right eye
upperJawlineLeft Left temple
upperJawlineRight Right temple
chinBottom Chin

Create face recognition Lambda function

To separate the roles, create another face recognition Lambda function in addition to the controller Lambda function. When you create it, just like the controller Lambda function, Select python3.x and the execution role is the same. Also, set the timeout of 1min and the memory of 512MB in the same way in "Basic settings".

After creating it, there is no package to introduce here, so No need to upload zip, All you have to do is fill in the automatically generated Lambda_function.py code below.

lambda_function_for_rekognition.py


import json
import boto3

rekognition = boto3.client("rekognition")


def lambda_handler(event, context):

    #Get the image file path from the event
    bucket = event["Bucket"]
    key = event["Key"]

    #Call Rekognition for face recognition
    response = rekognition.detect_faces(
        Image={'S3Object': {'Bucket': bucket, 'Name': key}}, Attributes=['ALL'])

    #How many people are in the photo
    number_of_people = len(response["FaceDetails"])

    #Make a list of all the required landmarks
    all_needed_landmarks = []
    #Process by the number of people
    for i in range(number_of_people):
        #This is a list of dictionaries
        all_landmarks_of_one_person = response["FaceDetails"][i]["Landmarks"]
        #This time eyeLeft, eyeRight, upperJawlineLeft, upperJawlineRight,Using only chinBottom
        # needed_Extract to landmarks
        needed_landmarks = []
        for type in ["eyeLeft", "eyeRight", "upperJawlineLeft", "upperJawlineRight", "chinBottom"]:
            landmark = next(
                item for item in all_landmarks_of_one_person if item["Type"] == type)
            needed_landmarks.append(landmark)
        all_needed_landmarks.append(needed_landmarks)

    return all_needed_landmarks

Controller Lambda function

I've already filled in the controller Lambda function, so This is just a code description for the third part.

lambda_function_for_controller.py


        lambdaRekognitionName = "<Here is arn of face recognition lambda>"
        params = {"Bucket": bucket, "Key": key}  #Image file path information
        payload = json.dumps(params)
        response = boto3.client("lambda").invoke(
            FunctionName=lambdaRekognitionName, InvocationType="RequestResponse", Payload=payload)
        response = json.load(response["Payload"])

3-4 New image generation part

Process flow

The fourth part is the new image generation part. In other words, it is the part that combines the photographic image and the following new mask image:

name Bane Joker Immortan Joe
Mask image
※1

※2

※3
Source The Dark Knight Rises darkKnight MadMax:FuryRoad

※1:This work is a derivative of "Bane"byistolethetv,usedunderCCBY2.0. ※2:This work is a derivative of this photo,usedunderCC01.0. ※3:This work, "joe's mask" is a derivative of "File:Fan_Expo_2015_-Immortan_Joe(21147179383).jpg"byGabboT,usedunderCCBY-SA2.0."joe'smask"islicensedCCBY-SA2.0 by y2-peng.

The processing flow on AWS is as follows:

※This work is a derivative of "[Bane](https://www.flickr.com/photos/istolethetv/30216006787/)"by[istolethetv](https://www.flickr.com/people/istolethetv/),usedunder[CCBY2.0](https://creativecommons.org/licenses/by/2.0/).

That's all for processing.

Create new image generation Lambda

First, create a new Lambda function in AWS Lambda. The runtime and execution roles are the same as before. Also, as before, set the memory and timeout from "Basic settings".

This time, image combining requires two python packages, "pillow" and "numpy". Therefore, first create a new folder and install the package using the following command.

python -m pip install pillow numpy -t <new_folder>

Then, create "lambda_function.py" in that folder and enter the following code.

lambda_function_for_new_image_gengeration.py


import json
import boto3

import numpy as np

from PIL import Image, ImageFile
from operator import sub
from io import BytesIO
from random import choice

s3 = boto3.client("s3")


class NewPhotoMaker:
    def __init__(self, all_landmarks, bucket, photo_key, new_photo_key):
        self.all_landmarks = eval(all_landmarks)
        self.bucket = bucket
        self.photo_key = photo_key
        self.new_photo_key = new_photo_key

    #Load photographic image
    def load_photo_image(self):
        s3.download_file(self.bucket, self.photo_key, "/tmp/photo_file")
        self.photo_image = Image.open("/tmp/photo_file")

    #Load mask image
    def load_mask_image(self):
        #bane (Batman),joker (Batman),Random selection from immortan joe (Mad Max)
        mask_key = "masks/" + choice(["bane", "joker", "joe"]) + ".png "
        s3.download_file(self.bucket, mask_key, "/tmp/mask_file")
        self.mask_image = Image.open("/tmp/mask_file")

    #Change from a landmark (ratio) to a specific point
    def landmarks_to_points(self):
        upperJawlineLeft_landmark = next(
            item for item in self.landmarks if item["Type"] == "upperJawlineLeft")
        upperJawlineRight_landmark = next(
            item for item in self.landmarks if item["Type"] == "upperJawlineRight")
        eyeLeft_landmark = next(
            item for item in self.landmarks if item["Type"] == "eyeLeft")
        eyeRight_landmark = next(
            item for item in self.landmarks if item["Type"] == "eyeRight")

        self.upperJawlineLeft_point = [int(self.photo_image.size[0] * upperJawlineLeft_landmark["X"]), 
                                       int(self.photo_image.size[1] * upperJawlineLeft_landmark["Y"])]
        self.upperJawlineRight_point = [int(self.photo_image.size[0] * upperJawlineRight_landmark["X"]), 
                                        int(self.photo_image.size[1] * upperJawlineRight_landmark["Y"])]
        self.eyeLeft_point = [int(self.photo_image.size[0] * eyeLeft_landmark["X"]),
                              int(self.photo_image.size[1] * eyeLeft_landmark["Y"])]
        self.eyeRight_point = [int(self.photo_image.size[0] * eyeRight_landmark["X"]),
                               int(self.photo_image.size[1] * eyeRight_landmark["Y"])]

    #Resize the mask image to fit the face width
    def resize_mask(self):
        face_width = int(np.linalg.norm(list(map(sub, self.upperJawlineLeft_point, self.upperJawlineRight_point))))
        new_hight = int(self.mask_image.size[1]*face_width/self.mask_image.size[0])
        self.mask_image = self.mask_image.resize((face_width, new_hight))

    #Rotate the mask image to match the angle of the face (not the slanted face due to neck rotation)
    def rotate_mask(self):
        angle = np.arctan2(self.upperJawlineRight_point[1] - self.upperJawlineLeft_point[1],
                           self.upperJawlineRight_point[0] - self.upperJawlineLeft_point[0])
        angle = -np.degrees(angle)  # radian to dgree
        self.mask_image = self.mask_image.rotate(angle, expand=True)

    #Combine photographic image and mask image
    def match_mask_position(self):
        #Matching using eye position
        face_center = [int((self.eyeLeft_point[0] + self.eyeRight_point[0])/2),
                       int((self.eyeLeft_point[1] + self.eyeRight_point[1])/2)]
        mask_center = [int(self.mask_image.size[0]/2),
                       int(self.mask_image.size[1]/2)]
        x = face_center[0] - mask_center[0]
        y = face_center[1] - mask_center[1]
        self.photo_image.paste(self.mask_image, (x, y), self.mask_image)

    #Save new image file to S3
    def save_new_photo(self):
        new_photo_byte_arr = BytesIO()
        self.photo_image.save(new_photo_byte_arr, format="JPEG")
        new_photo_byte_arr = new_photo_byte_arr.getvalue()
        s3.put_object(Bucket=self.bucket, Key=self.new_photo_key,
                      Body=new_photo_byte_arr)

    #Run
    def run(self):

        self.load_photo_image()

        #Processing for the number of people
        for i in range(len(self.all_landmarks)):
            self.load_mask_image()  #Load one new mask each time
            self.landmarks = self.all_landmarks[i]
            self.landmarks_to_points()
            self.resize_mask()
            self.rotate_mask()
            self.match_mask_position()

        self.save_new_photo()

#lambda main function


def lambda_handler(event, context):
    landmarks = event["landmarks"]
    bucket = event["bucket"]
    photo_key = event["photo_key"]
    new_photo_key = event["new_photo_key"]

    photo_maker = NewPhotoMaker(landmarks, bucket, photo_key, new_photo_key)
    photo_maker.run()

Finally, zip all the contents of the folder and upload it to Lambda. This completes the creation of a new image generation.

Controller Lambda function

The controller Lambda code for this part is below:

lambda_function_for_controller.py


        #Call new image generation lambda
        lambdaNewMaskName = "<Here is arn of new image generation lambda>"
        params = {"landmarks": str(response),
                  "bucket": bucket,
                  "photo_key": key,
                  "new_photo_key": new_key}
        payload = json.dumps(params)
        boto3.client("lambda").invoke(FunctionName=lambdaNewMaskName,
                                      InvocationType="RequestResponse", Payload=payload)

3-5 New image output part

Image output on LINE Bot

The last part is the output part of the new image. This app inputs and outputs images with LINE Bot, and when inputting, it passes the image file directly, The output cannot send the image file directly.

Image message document in LINE Bot Messageing Api is an image transmission method to the user. Is stipulated. The API can receive the URL of the image, not the image file. According to the documentation, Communication between the user and the LINE Bot is via the LINE platform. So this transmission process is

  1. "Send image URL from LINE Bot to LINE platform"
  2. "LINE platform loads images stored in S3 bucket"
  3. "LINE platform sends images to users"

It has become. But this process makes ** S3 bucket permissions a problem **. If the access right is set to "private", the LINE platform will not be able to read the image and the image given by the user will look like this: If the access right is set to "public", anyone can access it by knowing the S3 object URL of the image. This means that your photos can be seen by others, which is a privacy issue.

For the time being, I thought about using DynamoDB etc. to authenticate LINE users, The amount of work has increased considerably, and it collides with the concept of "reducing the amount of work as much as possible". To be honest, I don't want to do it.

After a lot of research, I finally found a good way. It's a "signed URL".

Signed URL

To protect your privacy, make your access to your S3 bucket "private". I can't access it even if I know the S3 object URL of the image. But if you use the Signed URL issued with the authority of the IAM role, it is private. You will be able to access certain objects in your S3 bucket. It looks like a conference URL with a zoom password.

You can also set an expiration date for this signed URL. When it expires, you can no longer use the URL, which makes it one step more secure: But one thing to note is the length of the signed URL. The signed URL issued with the privileges of the IAM role contains token information for temporary access, so the URL will be quite long. However, according to the LINE Bot Image message API, the maximum length of the URL that can be received is 1000 characters. Therefore, if the S3 bucket name, image file path, and image file name are too long, the URL will exceed 1000 characters and you will not be able to send it. So when I created the second part of the S3 bucket, I sometimes said, "The bucket name should be as short as possible." For the same reason, the new image file name should be the last 3 characters of the message ID (shorten the file name). I also save the new image file in the roll folder of the S3 bucket (shorten the file path). This solved the signed URL length issue.

Supplement: There is actually another solution to the signed URL length problem. It's about issuing URLs with IAM user privileges, not IAM roles. URLs issued by IAM users do not need tokens and can be shortened, You must use the IAM user's "access key ID" and "secret access key". For security reasons, we don't recommend issuing URLs as an IAM user.

Process flow

Now that we've solved the S3 bucket permissions issue, let's implement this part. The flow of this part is as follows:

First, the controller Lambda function passes the signed URL of the new image to the LINE Bot. Then, the LINE Bot reads the image file from the S3 bucket (the actual reading is done on the LINE platform), and Send to the last user. This is the end of the process.

Controller Lambda function

As with the part above, we'll cover the controller Lambda function code for this part.

lambda_function_for_controller.py


        #Signed URL generation
        presigned_url = s3.generate_presigned_url(ClientMethod="get_object", Params={
                                                  "Bucket": bucket, "Key": new_key}, ExpiresIn=600)

lambda_function_for_controller.py


        #Replying to new image message
        line_bot_api.reply_message(line_event.reply_token, ImageSendMessage(
            original_content_url=presigned_url, preview_image_url=presigned_url))

4. Actual result

Let's try the app we made!

interface

First, send and receive using the LINE interface. There is a bot QR code from LINE Bot's "Messaging API Settings" that you can use to add to your friends. I'll send it later. .. .. ※This work, "wearing joe's mask" is a derivative of "File:Fan_Expo_2015_-Immortan_Joe(21147179383).jpg"byGabboT,usedunderCCBY-SA2.0."wearingjoe'smask"islicensedCCBY-SA2.0 by y2-peng.

You've done it! Now let's find out what patterns work and what doesn't work!

Successful pattern

description before after
1 person front IMG_1593.jpg IMG_1603.JPG※1
1 person front (with rotation) IMG_1597.jpg IMG_1604.JPG※2
Front of multiple people 1.jpg 2.jpg※3
Even if the face is too big IMG_1606.jpg IMG_1608.JPG※4

※1:This work is a derivative of this photo,usedunderCC01.0. ※2:This work, "result 2" is a derivative of "File:Fan_Expo_2015_-Immortan_Joe(21147179383).jpg"byGabboT,usedunderCCBY-SA2.0."result2"islicensedCCBY-SA2.0 by y2-peng. ※3:This work, "masked 4" is a derivative of "File:Fan_Expo_2015_-Immortan_Joe(21147179383).jpg" by GabboT, used under CC BY-SA 2.0, "Bane"byistolethetv,usedunderCCBY2.0, and this photo,usedunderCC01.0. "masked 4" is licensed CC BY-SA 2.0 by y2-peng. ※4:This work is a derivative of "Bane"byistolethetv,usedunderCCBY2.0.

Patterns that don't work

description before after
Diagonal face 3.jpg 4.jpg※1
Face too small (the person at the back) 5.jpg 6.jpg※2
Blur (the person behind) 7.jpg 8.jpg※3

※1:This work, "standing 2" is a derivative of "File:Fan_Expo_2015_-Immortan_Joe(21147179383).jpg" by GabboT, used under CC BY-SA 2.0 and "Bane"byistolethetv,usedunderCCBY2.0. "standing 2" is licensed CC BY-SA 2.0 by y2-peng. ※2:This work, "standing 4" is a derivative of "File:Fan_Expo_2015_-Immortan_Joe(21147179383).jpg" by GabboT, used under CC BY-SA 2.0 and "Bane"byistolethetv,usedunderCCBY2.0. "standing 4" is licensed CC BY-SA 2.0 by y2-peng. ※3:This work is a derivative of "Bane"byistolethetv,usedunderCCBY2.0.

analysis

Depending on the result, if it is front and clear, the processing can be roughly done. If there is a blur, the face cannot be recognized and processing will not be performed. If the face is slanted or too small, it will be processed, but the result is not correct.

5. Summary and impressions

Summary

This time, we have developed a LINE application that changes the white mask in the photo to a monster mask. By utilizing AWS services, it was possible to realize it without a server, and we were able to thoroughly implement the concept of "reducing the amount of work as much as possible". If the photo is clear in the front, the conversion process is generally okay. However, processing diagonal faces and blurred faces will be an issue for the future.

Future tasks

  1. Diagonal face: Currently, the treatment of diagonal faces is incorrect. The reason is that the mask is only for the front, not for the diagonal face. As a future solution, I am thinking of rotating the 2D mask to the 3D coordinate system and then combining it, or preparing a mask image for diagonal faces.
  2. Face too small or blurred: The current face recognition uses AWS Rekognition, and its performance sets the upper limit of the performance of this app. If I could develop a more accurate face recognition system myself, I think I could solve this problem. (But it's a conflict with "reducing the amount of work as much as possible" :()
  3. Mask selection: Currently, I use monster masks randomly among the three, but I would like to increase more in the future. Also, I want to allow users to select, not just random selection. Tag the mask so that it can meet all the user's requests such as "I want to wear a XX mask" and "I want a cute mask".

Other impressions

  1. Convenience of serverless: What I felt most this time was the charm of serverless. If you have a server, not only environment construction but also maintenance management is required, which takes a considerable amount of time. But serverless development could skip these and save time. It can be used for agile development. However, serverless processing in Lambd has performance limitations, so if you have complicated processing, let's start the server as well.
  2. AWS One year free best! !! : New AWS accounts have a one-year "free tier" and all use within a certain range is free. The Lambda, API Gateway, S3, Rekognition and Cloudwatch used for this development were all made for 0 yen, which was a great deal. I would like to try various things during the free period of the remaining few months. If you are interested, please do! It's free!

6. All chords

lambda_function_for_controller.py

lambda_function_for_controller.py


import os
import sys
import logging

import boto3
import json

from linebot import LineBotApi, WebhookHandler
from linebot.models import MessageEvent, TextMessage, TextSendMessage, ImageMessage, ImageSendMessage
from linebot.exceptions import LineBotApiError, InvalidSignatureError


logger = logging.getLogger()
logger.setLevel(logging.ERROR)

#Read the line bot channel access token and secret from the environment variables
channel_secret = os.getenv('LINE_CHANNEL_SECRET', None)
channel_access_token = os.getenv('LINE_CHANNEL_ACCESS_TOKEN', None)
if channel_secret is None:
    logger.error('Specify LINE_CHANNEL_SECRET as environment variable.')
    sys.exit(1)
if channel_access_token is None:
    logger.error('Specify LINE_CHANNEL_ACCESS_TOKEN as environment variable.')
    sys.exit(1)

# api&Generate handler
line_bot_api = LineBotApi(channel_access_token)
handler = WebhookHandler(channel_secret)

#Connect with S3 bucket
s3 = boto3.client("s3")
bucket = "<S3 bucket name>"

#Lambda main function
def lambda_handler(event, context):

    #X for authentication-Line-Signature header
    signature = event["headers"]["X-Line-Signature"]

    body = event["body"]

    #Setting the return value
    ok_json = {"isBase64Encoded": False,
               "statusCode": 200,
               "headers": {},
               "body": ""}
    error_json = {"isBase64Encoded": False,
                  "statusCode": 403,
                  "headers": {},
                  "body": "Error"}

    @handler.add(MessageEvent, message=ImageMessage)
    def message(line_event):

        #User profile
        profile = line_bot_api.get_profile(line_event.source.user_id)

        #Extract the ID of the user who sent(push_Use if message,Not necessary for reply)
        # user_id = profile.user_id

        #Extract message ID
        message_id = line_event.message.id

        #Extract image file
        message_content = line_bot_api.get_message_content(message_id)
        content = bytes()
        for chunk in message_content.iter_content():
            content += chunk

        #Save image file
        key = "origin_photo/" + message_id
        new_key = message_id[-3:]
        s3.put_object(Bucket=bucket, Key=key, Body=content)

        #Call face recognition lambda
        lambdaRekognitionName = "<Here is arn of face recognition lambda>"
        params = {"Bucket": bucket, "Key": key}  #Image file path information
        payload = json.dumps(params)
        response = boto3.client("lambda").invoke(
            FunctionName=lambdaRekognitionName, InvocationType="RequestResponse", Payload=payload)
        response = json.load(response["Payload"])

        #Call new image generation lambda
        lambdaNewMaskName = "<Here is arn of new image generation lambda>"
        params = {"landmarks": str(response),
                  "bucket": bucket,
                  "photo_key": key,
                  "new_photo_key": new_key}
        payload = json.dumps(params)
        boto3.client("lambda").invoke(FunctionName=lambdaNewMaskName,
                                      InvocationType="RequestResponse", Payload=payload)

        #Signed URL generation
        presigned_url = s3.generate_presigned_url(ClientMethod="get_object", Params={
                                                  "Bucket": bucket, "Key": new_key}, ExpiresIn=600)

        #Replying to new image message
        line_bot_api.reply_message(line_event.reply_token, ImageSendMessage(
            original_content_url=presigned_url, preview_image_url=presigned_url))

    try:
        handler.handle(body, signature)
    except LineBotApiError as e:
        logger.error("Got exception from LINE Messaging API: %s\n" % e.message)
        for m in e.error.details:
            logger.error("  %s: %s" % (m.property, m.message))
        return error_json
    except InvalidSignatureError:
        return error_json

    return ok_json

lambda_function_for_rekognition.py

lambda_function_for_rekognition.py


import json
import boto3

rekognition = boto3.client("rekognition")


def lambda_handler(event, context):

    #Get the image file path from the event
    bucket = event["Bucket"]
    key = event["Key"]

    #Call Rekognition for face recognition
    response = rekognition.detect_faces(
        Image={'S3Object': {'Bucket': bucket, 'Name': key}}, Attributes=['ALL'])

    #How many people are in the photo
    number_of_people = len(response["FaceDetails"])

    #Make a list of all the required landmarks
    all_needed_landmarks = []
    #Process by the number of people
    for i in range(number_of_people):
        #This is a list of dictionaries
        all_landmarks_of_one_person = response["FaceDetails"][i]["Landmarks"]
        #This time eyeLeft, eyeRight, upperJawlineLeft, upperJawlineRight,Using only chinBottom
        # needed_Extract to landmarks
        needed_landmarks = []
        for type in ["eyeLeft", "eyeRight", "upperJawlineLeft", "upperJawlineRight", "chinBottom"]:
            landmark = next(
                item for item in all_landmarks_of_one_person if item["Type"] == type)
            needed_landmarks.append(landmark)
        all_needed_landmarks.append(needed_landmarks)

    return all_needed_landmarks

lambda_function_for_new_image_gengeration.py

lambda_function_for_new_image_gengeration.py


import json
import boto3

import numpy as np

from PIL import Image, ImageFile
from operator import sub
from io import BytesIO
from random import choice

s3 = boto3.client("s3")


class NewPhotoMaker:
    def __init__(self, all_landmarks, bucket, photo_key, new_photo_key):
        self.all_landmarks = eval(all_landmarks)
        self.bucket = bucket
        self.photo_key = photo_key
        self.new_photo_key = new_photo_key

    #Load photographic image
    def load_photo_image(self):
        s3.download_file(self.bucket, self.photo_key, "/tmp/photo_file")
        self.photo_image = Image.open("/tmp/photo_file")

    #Load mask image
    def load_mask_image(self):
        #bane (Batman),joker (Batman),Random selection from immortan joe (Mad Max)
        mask_key = "masks/" + choice(["bane", "joker", "joe"]) + ".png "
        s3.download_file(self.bucket, mask_key, "/tmp/mask_file")
        self.mask_image = Image.open("/tmp/mask_file")

    #Change from a landmark (ratio) to a specific point
    def landmarks_to_points(self):
        upperJawlineLeft_landmark = next(
            item for item in self.landmarks if item["Type"] == "upperJawlineLeft")
        upperJawlineRight_landmark = next(
            item for item in self.landmarks if item["Type"] == "upperJawlineRight")
        eyeLeft_landmark = next(
            item for item in self.landmarks if item["Type"] == "eyeLeft")
        eyeRight_landmark = next(
            item for item in self.landmarks if item["Type"] == "eyeRight")

        self.upperJawlineLeft_point = [int(self.photo_image.size[0] * upperJawlineLeft_landmark["X"]), 
                                       int(self.photo_image.size[1] * upperJawlineLeft_landmark["Y"])]
        self.upperJawlineRight_point = [int(self.photo_image.size[0] * upperJawlineRight_landmark["X"]), 
                                        int(self.photo_image.size[1] * upperJawlineRight_landmark["Y"])]
        self.eyeLeft_point = [int(self.photo_image.size[0] * eyeLeft_landmark["X"]),
                              int(self.photo_image.size[1] * eyeLeft_landmark["Y"])]
        self.eyeRight_point = [int(self.photo_image.size[0] * eyeRight_landmark["X"]),
                               int(self.photo_image.size[1] * eyeRight_landmark["Y"])]

    #Resize the mask image to fit the face width
    def resize_mask(self):
        face_width = int(np.linalg.norm(list(map(sub, self.upperJawlineLeft_point, self.upperJawlineRight_point))))
        new_hight = int(self.mask_image.size[1]*face_width/self.mask_image.size[0])
        self.mask_image = self.mask_image.resize((face_width, new_hight))

    #Rotate the mask image to match the angle of the face (not the slanted face due to neck rotation)
    def rotate_mask(self):
        angle = np.arctan2(self.upperJawlineRight_point[1] - self.upperJawlineLeft_point[1],
                           self.upperJawlineRight_point[0] - self.upperJawlineLeft_point[0])
        angle = -np.degrees(angle)  # radian to dgree
        self.mask_image = self.mask_image.rotate(angle, expand=True)

    #Combine photographic image and mask image
    def match_mask_position(self):
        #Matching using eye position
        face_center = [int((self.eyeLeft_point[0] + self.eyeRight_point[0])/2),
                       int((self.eyeLeft_point[1] + self.eyeRight_point[1])/2)]
        mask_center = [int(self.mask_image.size[0]/2),
                       int(self.mask_image.size[1]/2)]
        x = face_center[0] - mask_center[0]
        y = face_center[1] - mask_center[1]
        self.photo_image.paste(self.mask_image, (x, y), self.mask_image)

    #Save new image file to S3
    def save_new_photo(self):
        new_photo_byte_arr = BytesIO()
        self.photo_image.save(new_photo_byte_arr, format="JPEG")
        new_photo_byte_arr = new_photo_byte_arr.getvalue()
        s3.put_object(Bucket=self.bucket, Key=self.new_photo_key,
                      Body=new_photo_byte_arr)

    #Run
    def run(self):

        self.load_photo_image()

        #Processing for the number of people
        for i in range(len(self.all_landmarks)):
            self.load_mask_image()  #Load one new mask each time
            self.landmarks = self.all_landmarks[i]
            self.landmarks_to_points()
            self.resize_mask()
            self.rotate_mask()
            self.match_mask_position()

        self.save_new_photo()

#lambda main function


def lambda_handler(event, context):
    landmarks = event["landmarks"]
    bucket = event["bucket"]
    photo_key = event["photo_key"]
    new_photo_key = event["new_photo_key"]

    photo_maker = NewPhotoMaker(landmarks, bucket, photo_key, new_photo_key)
    photo_maker.run()

Recommended Posts

From white mask to monster mask, development description of serverless LINE photo processing application on AWS
Procedure from AWS CDK (Python) development to AWS resource construction * For beginners of development