[LINUX] Move CloudWatch logs to S3 on a regular basis with Lambda

Introduction

CloudWatch is a convenient way to collect AWS system logs. At work, I install CloudWatch Agent on my EC2 instance and send access logs and error logs of various applications (Nginx, Tomcat, etc.) to CloudWatch Logs.

However, storing logs in CloudWatch Logs is quite expensive, so I would like to move old logs to S3 on a regular basis. So, in this article, I'll summarize how to move CloudWatch logs to S3 on a regular basis using Lambda.

csl.png

various settings

Lambda settings

First, give Lambda the appropriate permissions to deploy the move code. Set the IAM Role with the following permissions in Lambda.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "cloudwatch:*",
            "Resource": "*"
        },
        {
            "Action": [
                "s3:Get*",
                "s3:Put*",
                "s3:List*",
            ],
            "Resource": [
                "*"
            ],
            "Effect": "Allow"
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents",
                "logs:CreateExportTask",
                "logs:DescribeLogGroups",
                "logs:Get*"
            ],
            "Resource": "*"
        }
    ]
}

Give S3 read and write permissions, CloudWatch log read permissions, and task creation permissions.

Code to deploy to Lambda (Python)

When I write the code in Python, it looks like this: Other than the standard library, boto3 is used, so when deploying, upload it in a zip file together with the following code.

import boto3
import datetime
import time

#Query when searching log groups-
PREFIX = 'test-'
#S3 bucket to store logs
S3_BUCKET = 'test_bucket'
#S3 directory to store logs
S3_DIR = 'logs'

def main(event,context):
    '''
Function called in main
    '''
    #boto3 client
    client = boto3.client('logs')
    #Get a list of log groups
    log_groups = get_log_group_list(client)
    #Store log contents in s3
    move_logs(client,log_groups)

def get_log_group_list(client):
    '''
Get a list of log group information
    '''
    should_continue = True
    next_token = None
    log_groups=[]
    #It may not be possible to remove all at once, so acquire it repeatedly.
    while should_continue:
        if next_token == None:
            #For the first request
            response = client.describe_log_groups(
                logGroupNamePrefix=PREFIX,
                limit=50
            )
        else:
            #For the second and subsequent requests
            response = client.describe_log_groups(
                logGroupNamePrefix=PREFIX,
                limit=50,
                nextToken=next_token
            )
        #Add the obtained result to the list
        for log in response['logGroups']:
            log_groups.append(log)
        #Also determine if a request should be made
        if 'nextToken' in response.keys():
            next_token = response['nextToken']
        else:
            should_continue = False
    return log_groups

def create_export_task(client,log_groups):
    '''
Move log contents to s3
    '''
    #Gets the current time and converts it to UNIX time
    time_now = datetime.datetime.now()
    unix_time_now = int(time_now.timestamp())
    #Repeat for the number of log groups
    for log in log_groups:
        for x in range(20):
            try:
                response = client.create_export_task(
                    fromTime=0,
                    to=unix_time_now,
                    logGroupName=log['logGroupName'],
                    destination=S3_BUCKET,
                    destinationPrefix=S3_DIR
                )
            except:
                #If you already have a task, wait a moment and try again
                time.sleep(20)
                continue

In the variable PREFIX defined at the top, specify the first character string of the log group to which you want to migrate the logs. Here, it is set to move the log of the log group starting with "test-" to S3.

The main () function is called at run time. This function calls the following two functions in sequence.

-** Get_log_group_list () ** function to get the information of the log group whose contents you want to transfer to S3 and store it in the array. Please note that due to the specifications of boto3, you can only get up to 50 log groups at a time. Therefore, here, if all the log group information cannot be obtained at once, the request is sent again. -** The create_export_task () ** function actually throws a request to create a log move task. Please note that you cannot create more than one task at the same time. Therefore, if you send requests continuously, an Exception of boto3 will occur, so after catching it and re-throwing the request after several tens of seconds.

S3 settings

Finally, configure the S3 bucket you want to export your logs to. Set the following json as a policy.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "logs.ap-northeast-1.amazonaws.com"
            },
            "Action": "s3:PutObject",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "logs.ap-northeast-1.amazonaws.com"
            },
            "Action": "s3:GetBucketAcl",
            "Resource": "*"
        }
    ]
}

If you forget to specify this, you will get the error ** "An error occurred (InvalidParameterException) when calling the CreateExportTask operation: GetBucketAcl call on the given bucket failed. Please check if CloudWatch Logs has been granted permission to perform this operation." ** , Occurs when executing a Lambda function. I was confused because the error message was misleading as if it was a problem with the CloudWatch Logs settings.

Finally

If you use CloudWatch cron to set the created Lambda function to be executed regularly, you can move the CloudWatch log to S3 every day, for example.

However, there is one caveat. The log export task is time consuming, which means that if you try to export 10 or more log groups at once, it will take more than 15 minutes, which is Lambda's maximum execution time. It's inconvenient.

If you want to move a lot of logs to s3 at once, it may be better to run it as a cron job in EC2 instead of using Lambda.

Recommended Posts

Move CloudWatch logs to S3 on a regular basis with Lambda
Output CloudWatch Logs to S3 with AWS Lambda (Pythyon ver)
[Python] Regularly export from CloudWatch Logs to S3 with Lambda
Serverless scraping on a regular basis with AWS lambda + scrapy Part 1
Connect to s3 with AWS Lambda Python
Use AWS lambda to scrape the news and notify LINE of updates on a regular basis [python]
View images on S3 with API Gateway + Lambda
Export RDS snapshot to S3 with Lambda (Python)
How to publish a blog on Amazon S3 with the static Blog engine'Pelican'for Pythonista
Lambda Function (python version) that decompresses and outputs elements to CloudWatch Logs when a compressed file is uploaded to s3
I want to AWS Lambda with Python on Mac!
I want to bind a local variable with lambda
Dynamically move Amplify with Lambda
[Python] Convert CSV file uploaded to S3 to JSON file with AWS Lambda
A note on what you did to use Flycheck with Python
Build a Flask / Bottle-like web application on AWS Lambda with Chalice
A story that I fixed when I got Lambda logs from Cloudwatch Logs
[Python 3.8 ~] How to define a recursive function smartly with a lambda expression
Send images taken with ESP32-WROOM-32 to AWS (API Gateway → Lambda → S3)
How to customize U-Boot with OSD335X on a custom board (memo)
Get data from your website on a regular basis using ScraperWiki
How to create a serverless machine learning API with AWS Lambda
I made a bot to post on twitter by web scraping a dynamic site with AWS Lambda (continued)
[AWS lambda] Deploy including various libraries with lambda (generate a zip with a password and upload it to s3) @ Python
Upload data to s3 of aws with a command and update it, and delete the used data (on the way)
A memo to move Errbot locally
Mount S3 on Ubuntu with goofys
Use boto3 to mess with S3
How to generate a new loggroup in CloudWatch using python within Lambda
Set up a Lambda function and let it work with S3 events!
Steps to create a Python virtual environment with VS Code on Windows
I tried to make "Sakurai-san" a LINE BOT with API Gateway + Lambda
I tried to draw a system configuration diagram with Diagrams on Docker
How to draw a vertical line on a heatmap drawn with Python seaborn
Get exchange rates on Heroku regularly and upload logs to Amazon S3
Upload what you got in request to S3 with AWS Lambda Python
It is convenient to use Layers when putting a library on Lambda