[PYTHON] Format AWS ALB access log into JSON format

If you want to analyze access logs in earnest, you should use Athena.

But sometimes you just want to see the access log in front of you. At such times, the space-separated format is bad for the eyes. Therefore, convert it to JSON that is easy to see.

Space delimiters are a type of csv and can be parsed with the csv module.

#!/usr/local/bin/python
# alb_access_log_to_json.py

import fileinput
import json
import csv

# https://docs.aws.amazon.com/ja_jp/elasticloadbalancing/latest/application/load-balancer-access-logs.html#access-log-entry-format
FIELD_KEYS = """
type
timestamp
elb
client:port
target:port
request_processing_time
target_processing_time
response_processing_time
elb_status_code
target_status_code
received_bytes
sent_bytes
request
user_agent
ssl_cipher
ssl_protocol
target_group_arn
trace_id
domain_name
chosen_cert_arn
matched_rule_priority
request_creation_time
actions_executed
redirect_url
error_reason
target:port_list
target_status_code_list
""".split()

reader = csv.reader(fileinput.input(), delimiter=' ', quotechar='"', escapechar='\\')
for fields in reader:
    j = dict(zip(FIELD_KEYS, fields))
    print(json.dumps(j))

Execution example:

$ head -1 access_log.txt
h2 2020-03-08T23:50:58.701251Z app/xxxxxx-prod-alb/xxxxxxxxx 222.222.222.222:64202 - -1 -1 -1 302 - 1254 224 "GET https://example.com:443/action_store HTTP/2.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.18362" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 - "Root=1-xxxxxxx-xxxxxxxxxxxx" "at.m3.com" "arn:aws:acm:ap-northeast-1:xxxxxxxxxx:certificate/xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx" 300 2020-03-08T23:50:58.701000Z "redirect" "https://example.com:443/at/action_store" "-" "-" "-"

$ cat access_log.txt | python3 alb_access_log_to_json.py | jq .
{
  "type": "h2",
  "timestamp": "2020-03-08T23:50:58.701251Z",
  "elb": "app/xxxxxx-prod-alb/xxxxxxxxx",
  "client:port": "222.222.222.222:64202",
  "target:port": "-",
  "request_processing_time": "-1",
  "target_processing_time": "-1",
  "response_processing_time": "-1",
  "elb_status_code": "302",
  "target_status_code": "-",
  "received_bytes": "1254",
  "sent_bytes": "224",
  "request": "GET https://example.com:443/action_store HTTP/2.0",
  "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.18362",
  "ssl_cipher": "ECDHE-RSA-AES128-GCM-SHA256",
  "ssl_protocol": "TLSv1.2",
  "target_group_arn": "-",
  "trace_id": "Root=1-xxxxxxx-xxxxxxxxxxxx",
  "domain_name": "at.m3.com",
  "chosen_cert_arn": "arn:aws:acm:ap-northeast-1:xxxxxxxxxx:certificate/xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx",
  "matched_rule_priority": "300",
  "request_creation_time": "2020-03-08T23:50:58.701000Z",
  "actions_executed": "redirect",
  "redirect_url": "https://example.com:443/at/action_store",
  "error_reason": "-",
  "target:port_list": "-",
  "target_status_code_list": "-"
}

Recommended Posts

Format AWS ALB access log into JSON format
Output log in JSON format with Python standard logging
Format numbers into currency format