What I did when I got stuck in the time limit with lambda python

Introduction

In the old Lambda, the time limit was 5 minutes, and in batch processing, the time limit was exceeded and it terminated abnormally. Also, Lambda launched from API Gateway is said to be "cannot be faster". I will introduce the contents of correspondence at that time

Since I am an engineer, I should talk with numerical values, but I will omit it because it is confusing for various reasons, such as what was done in the previous job, there is no environment at hand, and creating an environment just for this article. To do

A little detour

If there is a problem with the response, I will take a profile first, but I will also note how I always do it just in case.

Target library

import cProfile
import pstats

You can get a profile with just cProfile, but you also use pstats to sort the results.

Start measurement

Instantiate cProfile and enable measurement

pr = cProfile.Profile()
pr.enable()

Measurement end and result confirmation

Disable profile and generate object for sorting Sort and output

pr.disable()
stats = pstats.Stats(pr)
stats.sort_stats('tottime')
stats.print_stats()

Please refer to the Manual for what kind of items can be sorted.

You may not find a noticeable bottleneck in your profile In that case, there were cases where improving the areas where I / O was likely to lead to good results.

Main subject

DictCur of psycopg2

It is very convenient to be able to refer to the search results with dict instead of list, but it was a big bottleneck because the mapping cost to dict was high. It seems that the fact that the target table had a considerable number of columns also has an effect. I gave up DictCur and instead abstracted the subscript of list with a constant to avoid it

Reference Url: Get the result in dict format with Python psycopg2

logger If you print Lambda, you can see the log in CloudWatch Logs, but the problematic Lambda used logger logger settings [darkness](https://qiita.com/amedama/items/b856b2f30c2f38665701#%E4%BD%95%E6%95%85%E3%83%80%E3%83%A1%E3% 81% AA% E3% 81% AE% E3% 81% 8B-logging) seems to be deep, so the purpose here is just to see the log.

Just changing the logger to print improved the processing speed I also wanted to control the output level like a logger, so I made a simple function to wrap the log output so as not to impair the functions required for logger.

In addition, there were cases where the processing speed improved simply by reducing prints. Log is necessary for production operation, but I think that there are cases where logs that are only needed for development are kept out. I want to output only the log that I really need

Cache 1

boto3 stood out when I took the profile That doesn't mean you can't use the SDK

So, I cached the boto3 object that was generated once and used the cache from the second time onward. It was a fairly large Lambda, consisting of multiple files and 10 or more classes. Changed to hold the generated boto3 object in the property of class

if self.__XXXX_table_obj is None:
    self.__XXXX_table_obj = boto3.resource("dynamodb").Table(self.get_XXXX_table())`

Also, the reference result of the master is also cached in the property.

if XXXX_id in self.__XXXX_cache:
    return self.__client_cache[XXXX_id]

In addition, since it is not an application that is accessed frequently, you can also acquire a new master when the Lambda container is regenerated.

Cache 2

I think there are times when you can't reach your target time no matter what you do. The system in question was one in which data was updated in batches at regular intervals. So, as long as the data is not updated, you can return the same content.

Therefore, the processing result (Json) is registered in DynamoDB and its contents are returned. The key is API name + parameter The API system was simple, so there was no problem.

When the data is updated, you will be notified on SNS, so re-register the contents of DynamoDB

One miscalculation was the maximum value (400KB) that can be stored in DynamoDB. There was a Json over Since it was JSON with many repetitive items, I got nothing by compressing it and registering it as a binary.

At the end

I'm sorry for the small story, but I hope this article will help those who are worried about the processing speed of Python Lambda.

Recommended Posts

What I did when I got stuck in the time limit with lambda python
A reminder of what I got stuck when starting Atcoder with python
What I got into Python for the first time
What I got stuck around GUI in WSL python environment
What I did to welcome the Python2 EOL with confidence
I got stuck when trying to specify a relative path with relative_to () in python
What I did when I was angry to put it in with the enable-shared option
I got an AttributeError when mocking the open method in python
What I did with a Python array
Upload what you got in request to S3 with AWS Lambda Python
[At Coder] What I did to reach the green rank in Python
What I did when updating from Python 2.6 to 2.7
[Question] What happens when I use% in python?
I got an error when I put opencv in python3 with Raspberry Pi [Remedy]
I referred to it when I got stuck in the django geodjango tutorial (editing)
When I cut the directory for UNIX Socket under / var / run with systemd, I got stuck in a pitfall and what to do
What I do when imitating embedded go in python
When I tried to introduce python3 to atom, I got stuck
What I learned in Python
Mezzanine introduction memo that I got stuck in the flow
I got lost in the maze
I liked the tweet with python. ..
I wrote the queue in Python
I wrote the stack in Python
I tried to describe the traffic in real time with WebSocket
I compared the calculation time of the moving average written in Python
What I was addicted to with json.dumps in Python base64 encoding
What I did when I wanted to make Python faster -Numba edition-
What to do when the value type is ambiguous in Python?
What should I do with the Python directory structure after all?
A story that didn't work when I tried to log in with the Python requests module
What I learned by writing a Python Pull Request for the first time in my life
Behavior when returning in the with block
What I got into when using Tensorflow-gpu
How to get the date and time difference in seconds with python
Display Python 3 in the browser with MAMP
I set the environment variable with Docker and displayed it in Python
What is "mahjong" in the Python library? ??
What I got from Python Boot Camp
MongoDB for the first time in Python
Get and convert the current time in the system local timezone with python
What I did when I couldn't find the feature point with the optical flow of opencv and when I lost it
I got stuck in a flask application redirect with a reverse proxy in between
A useful note when using Python for the first time in a while
I just did FizzBuzz with AWS Lambda
What I did to save Python memory
The story that had nothing to do with partitions when I did disk backup with dd for the first time
When I tried to use Python on WSL (windows subsystem for linux), it got stuck in Jupyter (solved)
[Python] I want to know the variables in the function when an error occurs!
I installed Pygame with Python 3.5.1 in the environment of pyenv on OS X
Visualize accelerometer information from the microcomputer board in real time with mbed + Python
I stumbled on the character code when converting CSV to JSON in Python
Execution order when multiple context managers are specified in the Python with statement
I measured the time when I pip installed the C language dependent module with alpine
Turn multiple lists with a for statement at the same time in Python
I tried "smoothing" the image with Python + OpenCV
I got an error when saving with OpenCV
Load the network modeled with Rhinoceros in Python ③
[Python3] A story stuck with time zone conversion
I was able to recurse in Python: lambda
What is wheezy in the Docker Python image?