What I did when I got stuck in the time limit with lambda python

Introduction

In the old Lambda, the time limit was 5 minutes, and in batch processing, the time limit was exceeded and it terminated abnormally. Also, Lambda launched from API Gateway is said to be "cannot be faster". I will introduce the contents of correspondence at that time

Since I am an engineer, I should talk with numerical values, but I will omit it because it is confusing for various reasons, such as what was done in the previous job, there is no environment at hand, and creating an environment just for this article. To do

A little detour

If there is a problem with the response, I will take a profile first, but I will also note how I always do it just in case.

Target library

import cProfile
import pstats

You can get a profile with just cProfile, but you also use pstats to sort the results.

Start measurement

Instantiate cProfile and enable measurement

pr = cProfile.Profile()
pr.enable()

Measurement end and result confirmation

Disable profile and generate object for sorting Sort and output

pr.disable()
stats = pstats.Stats(pr)
stats.sort_stats('tottime')
stats.print_stats()

Please refer to the Manual for what kind of items can be sorted.

You may not find a noticeable bottleneck in your profile In that case, there were cases where improving the areas where I / O was likely to lead to good results.

Main subject

DictCur of psycopg2

It is very convenient to be able to refer to the search results with dict instead of list, but it was a big bottleneck because the mapping cost to dict was high. It seems that the fact that the target table had a considerable number of columns also has an effect. I gave up DictCur and instead abstracted the subscript of list with a constant to avoid it

Reference Url: Get the result in dict format with Python psycopg2

logger If you print Lambda, you can see the log in CloudWatch Logs, but the problematic Lambda used logger logger settings [darkness](https://qiita.com/amedama/items/b856b2f30c2f38665701#%E4%BD%95%E6%95%85%E3%83%80%E3%83%A1%E3% 81% AA% E3% 81% AE% E3% 81% 8B-logging) seems to be deep, so the purpose here is just to see the log.

Just changing the logger to print improved the processing speed I also wanted to control the output level like a logger, so I made a simple function to wrap the log output so as not to impair the functions required for logger.

In addition, there were cases where the processing speed improved simply by reducing prints. Log is necessary for production operation, but I think that there are cases where logs that are only needed for development are kept out. I want to output only the log that I really need

Cache 1

boto3 stood out when I took the profile That doesn't mean you can't use the SDK

So, I cached the boto3 object that was generated once and used the cache from the second time onward. It was a fairly large Lambda, consisting of multiple files and 10 or more classes. Changed to hold the generated boto3 object in the property of class

if self.__XXXX_table_obj is None:
    self.__XXXX_table_obj = boto3.resource("dynamodb").Table(self.get_XXXX_table())`

Also, the reference result of the master is also cached in the property.

if XXXX_id in self.__XXXX_cache:
    return self.__client_cache[XXXX_id]

In addition, since it is not an application that is accessed frequently, you can also acquire a new master when the Lambda container is regenerated.

Cache 2

I think there are times when you can't reach your target time no matter what you do. The system in question was one in which data was updated in batches at regular intervals. So, as long as the data is not updated, you can return the same content.

Therefore, the processing result (Json) is registered in DynamoDB and its contents are returned. The key is API name + parameter The API system was simple, so there was no problem.

When the data is updated, you will be notified on SNS, so re-register the contents of DynamoDB

One miscalculation was the maximum value (400KB) that can be stored in DynamoDB. There was a Json over Since it was JSON with many repetitive items, I got nothing by compressing it and registering it as a binary.

At the end

I'm sorry for the small story, but I hope this article will help those who are worried about the processing speed of Python Lambda.