[PYTHON] A script that downloads AWS RDS log files at high speed

There are many ways to download logs stored on an instance of RDS using download_db_log_file_portion when you google. However, there was a problem, so I wrote a script that uses the API downloadCompleteLogFile.

Problems with download_db_log_file_portion:

--If it is awscli, it will be interrupted in the middle (pagination will be interrupted) --Even if you hit the API directly, it needs to be divided into small pieces, and it takes time to download. --Mojibake (If Japanese is included, it will be replaced with?)

downloadCompleteLogFile solves both problems.

Script to access downloadCompleteLogFile

You must sign SigV4 yourself to access downloadCompleteLogFile. You need an IAM user or IAM role.

This Python script is created by signing the curl command to download with SigV4. The Python script itself does not download, so it is a way to execute the created curl command later.

It is assumed that ~ / .aws / config and ~ / .aws / credentials have the appropriate permission settings.

import boto3
from botocore.awsrequest import AWSRequest
import botocore.auth as auth
import urllib.request

import pprint

profile     = "default"
instance_id = "database-1"
region = "ap-northeast-1"

session = boto3.session.Session(profile_name = profile)
credentials = session.get_credentials()
sigv4auth = auth.SigV4Auth(credentials, "rds", region)

rds_client = session.client('rds')
files = rds_client.describe_db_log_files(DBInstanceIdentifier = instance_id)

for file in files["DescribeDBLogFiles"]:
    file_name = file["LogFileName"]

    #Judge download exclusion from file name
    if not file_name.startswith("error/"):
        continue
    if file_name == "error/postgres.log":
        continue

    #downloadCompleteLogFile API URL
    remote_host = "rds." + region + ".amazonaws.com"
    url = "https://" + remote_host + "/v13/downloadCompleteLogFile/" + instance_id + "/" + file_name

    #Sig V4 signature
    awsreq = AWSRequest(method = "GET", url = url)
    sigv4auth.add_auth(awsreq)

    req = urllib.request.Request(url, headers = {
        "Authorization": awsreq.headers['Authorization'],
        "Host": remote_host,
        "X-Amz-Date": awsreq.context['timestamp'],
       })

    #Echo command for download progress
    echo_cmd = "echo '" + file_name + "' >&2"
    print(echo_cmd)

    #curl command
    header = " ".join(["-H '" + k + ": " + v + "'" for (k, v) in req.headers.items()])
    cmd = "curl " + header + " '" + url + "'"
    print(cmd)

This Python produces the following output:

echo 'error/postgresql.log.2020-11-05-23' >&2
curl -H 'Authorization: AWS4-HMAC-SHA256 Credential=AKIAXXXXXXXXXXXXXXXX/20201105/ap-northeast-1/rds/aws4_request, SignedHeaders=host;x-amz-date, Signature=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' -H 'Host: rds.ap-northeast-1.amazonaws.com' -H 'X-amz-date: 20201105T231307Z' 'https://rds.ap-northeast-1.amazonaws.com/v13/downloadCompleteLogFile/database-1/error/postgresql.log.2020-11-05-23'

Do this in Bash.

$ python download-rds-log.py | bash > log.txt

Since the curl command is output, it can be improved to parallel execution. It's much faster than using download_db_log_file_portion without modification.

Link

I also wrote about the SigV4 signature in the following article.

-To access the AWS API Gateway with IAM authentication by signing SigV4 from Python -To access the AWS API Gateway with IAM authentication from C # with a SigV4 signature as an IAM user -To access the AWS API Gateway with IAM authentication from C # with a SigV4 signature in the IAM role

Recommended Posts

A script that downloads AWS RDS log files at high speed
How to create large files at high speed
A small story that outputs table data in CSV format at high speed
A shell script that numbers duplicate names when creating files
A set of script files that do wordcloud in Python3
[Python] Articles that enable sparse matrix calculations at high speed
A python script that deletes ._DS_Store and ._ * files created on Mac
[BigQuery] Load a part of BQ data into pandas at high speed