Introduction

CloudWatch is a convenient way to collect AWS system logs. At work, I install CloudWatch Agent on my EC2 instance and send access logs and error logs of various applications (Nginx, Tomcat, etc.) to CloudWatch Logs.

However, storing logs in CloudWatch Logs is quite expensive, so I would like to move old logs to S3 on a regular basis. So, in this article, I'll summarize how to move CloudWatch logs to S3 on a regular basis using Lambda.

various settings

Lambda settings

First, give Lambda the appropriate permissions to deploy the move code. Set the IAM Role with the following permissions in Lambda.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "cloudwatch:*",
            "Resource": "*"
        },
        {
            "Action": [
                "s3:Get*",
                "s3:Put*",
                "s3:List*",
            ],
            "Resource": [
                "*"
            ],
            "Effect": "Allow"
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents",
                "logs:CreateExportTask",
                "logs:DescribeLogGroups",
                "logs:Get*"
            ],
            "Resource": "*"
        }
    ]
}

Give S3 read and write permissions, CloudWatch log read permissions, and task creation permissions.

Code to deploy to Lambda (Python)

When I write the code in Python, it looks like this: Other than the standard library, boto3 is used, so when deploying, upload it in a zip file together with the following code.

import boto3
import datetime
import time

#Query when searching log groups-
PREFIX = 'test-'
#S3 bucket to store logs
S3_BUCKET = 'test_bucket'
#S3 directory to store logs
S3_DIR = 'logs'

def main(event,context):
    '''
Function called in main
    '''
    #boto3 client
    client = boto3.client('logs')
    #Get a list of log groups
    log_groups = get_log_group_list(client)
    #Store log contents in s3
    move_logs(client,log_groups)

def get_log_group_list(client):
    '''
Get a list of log group information
    '''
    should_continue = True
    next_token = None
    log_groups=[]
    #It may not be possible to remove all at once, so acquire it repeatedly.
    while should_continue:
        if next_token == None:
            #For the first request
            response = client.describe_log_groups(
                logGroupNamePrefix=PREFIX,
                limit=50
            )
        else:
            #For the second and subsequent requests
            response = client.describe_log_groups(
                logGroupNamePrefix=PREFIX,
                limit=50,
                nextToken=next_token
            )
        #Add the obtained result to the list
        for log in response['logGroups']:
            log_groups.append(log)
        #Also determine if a request should be made
        if 'nextToken' in response.keys():
            next_token = response['nextToken']
        else:
            should_continue = False
    return log_groups

def create_export_task(client,log_groups):
    '''
Move log contents to s3
    '''
    #Gets the current time and converts it to UNIX time
    time_now = datetime.datetime.now()
    unix_time_now = int(time_now.timestamp())
    #Repeat for the number of log groups
    for log in log_groups:
        for x in range(20):
            try:
                response = client.create_export_task(
                    fromTime=0,
                    to=unix_time_now,
                    logGroupName=log['logGroupName'],
                    destination=S3_BUCKET,
                    destinationPrefix=S3_DIR
                )
            except:
                #If you already have a task, wait a moment and try again
                time.sleep(20)
                continue

In the variable PREFIX defined at the top, specify the first character string of the log group to which you want to migrate the logs. Here, it is set to move the log of the log group starting with "test-" to S3.

The main () function is called at run time. This function calls the following two functions in sequence.

-** Get_log_group_list () ** function to get the information of the log group whose contents you want to transfer to S3 and store it in the array. Please note that due to the specifications of boto3, you can only get up to 50 log groups at a time. Therefore, here, if all the log group information cannot be obtained at once, the request is sent again. -** The create_export_task () ** function actually throws a request to create a log move task. Please note that you cannot create more than one task at the same time. Therefore, if you send requests continuously, an Exception of boto3 will occur, so after catching it and re-throwing the request after several tens of seconds.

S3 settings

Finally, configure the S3 bucket you want to export your logs to. Set the following json as a policy.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "logs.ap-northeast-1.amazonaws.com"
            },
            "Action": "s3:PutObject",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "logs.ap-northeast-1.amazonaws.com"
            },
            "Action": "s3:GetBucketAcl",
            "Resource": "*"
        }
    ]
}

If you forget to specify this, you will get the error ** "An error occurred (InvalidParameterException) when calling the CreateExportTask operation: GetBucketAcl call on the given bucket failed. Please check if CloudWatch Logs has been granted permission to perform this operation." ** , Occurs when executing a Lambda function. I was confused because the error message was misleading as if it was a problem with the CloudWatch Logs settings.

Finally

If you use CloudWatch cron to set the created Lambda function to be executed regularly, you can move the CloudWatch log to S3 every day, for example.

However, there is one caveat. The log export task is time consuming, which means that if you try to export 10 or more log groups at once, it will take more than 15 minutes, which is Lambda's maximum execution time. It's inconvenient.

If you want to move a lot of logs to s3 at once, it may be better to run it as a cron job in EC2 instead of using Lambda.