Wenn Sie Zugriffsprotokolle ernsthaft analysieren möchten, sollten Sie Athena verwenden.
Es gibt jedoch Situationen, in denen Sie nur das Zugriffsprotokoll vor sich sehen möchten. In einem solchen Fall ist das durch Leerzeichen getrennte Format schlecht für die Augen. Konvertieren Sie es daher in einfach zu lesendes JSON.
Leerzeichenbegrenzer sind eine Art CSV und können mit dem CSV-Modul analysiert werden.
#!/usr/local/bin/python
# alb_access_log_to_json.py
import fileinput
import json
import csv
# https://docs.aws.amazon.com/ja_jp/elasticloadbalancing/latest/application/load-balancer-access-logs.html#access-log-entry-format
FIELD_KEYS = """
type
timestamp
elb
client:port
target:port
request_processing_time
target_processing_time
response_processing_time
elb_status_code
target_status_code
received_bytes
sent_bytes
request
user_agent
ssl_cipher
ssl_protocol
target_group_arn
trace_id
domain_name
chosen_cert_arn
matched_rule_priority
request_creation_time
actions_executed
redirect_url
error_reason
target:port_list
target_status_code_list
""".split()
reader = csv.reader(fileinput.input(), delimiter=' ', quotechar='"', escapechar='\\')
for fields in reader:
j = dict(zip(FIELD_KEYS, fields))
print(json.dumps(j))
Ausführungsbeispiel:
$ head -1 access_log.txt
h2 2020-03-08T23:50:58.701251Z app/xxxxxx-prod-alb/xxxxxxxxx 222.222.222.222:64202 - -1 -1 -1 302 - 1254 224 "GET https://example.com:443/action_store HTTP/2.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.18362" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 - "Root=1-xxxxxxx-xxxxxxxxxxxx" "at.m3.com" "arn:aws:acm:ap-northeast-1:xxxxxxxxxx:certificate/xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx" 300 2020-03-08T23:50:58.701000Z "redirect" "https://example.com:443/at/action_store" "-" "-" "-"
$ cat access_log.txt | python3 alb_access_log_to_json.py | jq .
{
"type": "h2",
"timestamp": "2020-03-08T23:50:58.701251Z",
"elb": "app/xxxxxx-prod-alb/xxxxxxxxx",
"client:port": "222.222.222.222:64202",
"target:port": "-",
"request_processing_time": "-1",
"target_processing_time": "-1",
"response_processing_time": "-1",
"elb_status_code": "302",
"target_status_code": "-",
"received_bytes": "1254",
"sent_bytes": "224",
"request": "GET https://example.com:443/action_store HTTP/2.0",
"user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.18362",
"ssl_cipher": "ECDHE-RSA-AES128-GCM-SHA256",
"ssl_protocol": "TLSv1.2",
"target_group_arn": "-",
"trace_id": "Root=1-xxxxxxx-xxxxxxxxxxxx",
"domain_name": "at.m3.com",
"chosen_cert_arn": "arn:aws:acm:ap-northeast-1:xxxxxxxxxx:certificate/xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx",
"matched_rule_priority": "300",
"request_creation_time": "2020-03-08T23:50:58.701000Z",
"actions_executed": "redirect",
"redirect_url": "https://example.com:443/at/action_store",
"error_reason": "-",
"target:port_list": "-",
"target_status_code_list": "-"
}