Introduction

Amazon Timestream, a fully managed time series database, was released to the public on 9/30, so let's touch it. I've known the existence of time series databases for a long time, but I've never used them, so I'm looking forward to it.

It's too new and neither CloudFormation nor Terraform is supported yet, so this time I'll try it from the console. By the way, it is not available in the Tokyo region yet, so let's point the console to the available region.

Try to create a database

Well, I wasn't sure when I tried to make it myself, so in such a case, [Tutorial](https://aws.amazon.com/jp/blogs/news/store-and-access-time-series-data- It would be a standard practice to do it while watching at-any-scale-with-amazon-timestream-now-generally-available /).

Press the "Create database" button on the following screen of Timestream.

キャプチャ1.png

Then, set the database name on the opened database creation screen.

There is a simple setting, but this time, select Standard database.

キャプチャ2.png

If you leave the KMS settings blank, the key will be created without permission. Set the tag as you like and press the "Create database" button.

キャプチャ3.png

Creation completed!

キャプチャ4.png

Try to make a table

Now, let's press the link of the table name created in ↑. There is a "Create table" button on the database details screen, so press it.

Then, set the table name on the opened table creation screen.

キャプチャ5.png

The data storage setting is a trial this time, so it's a textbook.

キャプチャ6.png

Set the tag as you like and press the "Create table" button.

キャプチャ7.png

Table creation is complete!

キャプチャ8.png

Register data

It's not fun to make it like the tutorial, so let's register Data output by Locust.

The data to be registered is as follows.

Timestamp,User Count,Type,Name,Requests/s,Failures/s,50%,66%,75%,80%,90%,95%,98%,99%,99.9%,99.99%,100%,Total Request Count,Total Failure Count,Total Median Response Time,Total Average Response Time,Total Min Response Time,Total Max Response Time,Total Average Content Size
1603535373,20,GET,/xxxxx/,1.000000,0.000000,5,6,6,6,8,9,9,9,9,9,9,16,0,4.11685699998543,5.413748562499876,4.11685699998543,9.385663000045952,14265.0

Create a command in Python below and load it. You may not be familiar with dimensions, but in short, you can think of them as attribute information for classification. This time, I defined HTTP resources and methods as attributes.

import sys
import csv
import time
import boto3
import psutil

from botocore.config import Config

FILENAME = sys.argv[1]

DATABASE_NAME = "xxxxx-test-timestream"
TABLE_NAME = "xxxxx-test-table"

def write_records(records):
    try:
        result = write_client.write_records(DatabaseName=DATABASE_NAME,
                                            TableName=TABLE_NAME,
                                            Records=records,
                                            CommonAttributes={})
        status = result['ResponseMetadata']['HTTPStatusCode']
        print("Processed %d records.WriteRecords Status: %s" %
              (len(records), status))
    except Exception as err:
        print("Error:", err)

if __name__ == '__main__':

    session = boto3.Session()
    write_client = session.client('timestream-write', config=Config(
        read_timeout=20, max_pool_connections=5000, retries={'max_attempts': 10}))
    query_client = session.client('timestream-query')

    with open(FILENAME) as f:
        reader = csv.reader(f, quoting=csv.QUOTE_NONE)

        for csv_record in reader:
            if csv_record[0] == 'Timestamp' or csv_record[3] == 'Aggregated':
                continue

            ts_records = []

            ts_columns = [
                { 'MeasureName': 'Requests/s',                  'MeasureValue': csv_record[4] },
                { 'MeasureName': '95Percentile Response Time',  'MeasureValue': csv_record[10] },
                { 'MeasureName': 'Total Median Response Time',  'MeasureValue': csv_record[18] },
                { 'MeasureName': 'Total Average Response Time', 'MeasureValue': csv_record[19] },
            ]

            for ts_column in ts_columns:
                ts_records.append ({
                    'Time': str(int(csv_record[0]) * 1000),
                    'Dimensions': [ {'Name': 'resource', 'Value': csv_record[3]}, {'Name': 'method', 'Value': csv_record[2]} ],
                    'MeasureName': ts_column['MeasureName'],
                    'MeasureValue': ts_column['MeasureValue'],
                    'MeasureValueType': 'DOUBLE'
                })

            write_records(ts_records)

However, it's a feature that has just been released to the public, so some people may have an older version of boto3.

$ pip list -o

So, let's check if boto3 is Latest.

Package               Version  Latest     Type
--------------------- -------- ---------- -----
boto3                 1.13.26  1.16.4     wheel

Update with pip with `` `-U```.

$ pip install -U boto3

Also, use aws configure to point the default region to the region where the database was created with ↑.

If psutil is not included, install it as follows.

$ yum install python3-devel
$ pip3 install psutil

I think that it will be fixed soon, but as of October 25, 2020, the command name is wrong in the official blog of ↑, so if you believe in the blog and pip3, you will not be able to install it and you will feel sad.

By the way, was it possible to load the data safely?

Issue a query

If you select "Query Editor" from the menu on the left, the following screen will be displayed, so let's execute SQL while narrowing down the attributes to the text. I want to know the average response time of GET requests in / xxxxx /!

When I executed it, only the information I wanted was extracted!

キャプチャ10.png

To get this as raw data, get it again with CLI or boto3. It is quite troublesome because a page nator is required. In the first place, it is easy to use pandas for a small amount, but in the actual usage scene, the information collected from thousands of servers etc. can be quickly retrieved, so with the amount of information that can be formatted locally with pandas There shouldn't be. The point that it can be monitored in real time in combination with Grafana is the true value.

[PYTHON] Try running Amazon Timestream

Introduction

Try to create a database

Try to make a table

Register data

Issue a query