[PYTHON] Try running Amazon Timestream

Introduction

Amazon Timestream, a fully managed time series database, was released to the public on 9/30, so let's touch it. I've known the existence of time series databases for a long time, but I've never used them, so I'm looking forward to it.

It's too new and neither CloudFormation nor Terraform is supported yet, so this time I'll try it from the console. By the way, it is not available in the Tokyo region yet, so let's point the console to the available region.

Try to create a database

Well, I wasn't sure when I tried to make it myself, so in such a case, [Tutorial](https://aws.amazon.com/jp/blogs/news/store-and-access-time-series-data- It would be a standard practice to do it while watching at-any-scale-with-amazon-timestream-now-generally-available /).

Press the "Create database" button on the following screen of Timestream.

キャプチャ1.png

Then, set the database name on the opened database creation screen.

キャプチャ2.png

If you leave the KMS settings blank, the key will be created without permission. Set the tag as you like and press the "Create database" button.

キャプチャ3.png

Creation completed!

キャプチャ4.png

Try to make a table

Now, let's press the link of the table name created in ↑. There is a "Create table" button on the database details screen, so press it.

Then, set the table name on the opened table creation screen.

キャプチャ5.png

The data storage setting is a trial this time, so it's a textbook.

キャプチャ6.png

Set the tag as you like and press the "Create table" button.

キャプチャ7.png

Table creation is complete!

キャプチャ8.png

Register data

It's not fun to make it like the tutorial, so let's register Data output by Locust.

The data to be registered is as follows.

Timestamp,User Count,Type,Name,Requests/s,Failures/s,50%,66%,75%,80%,90%,95%,98%,99%,99.9%,99.99%,100%,Total Request Count,Total Failure Count,Total Median Response Time,Total Average Response Time,Total Min Response Time,Total Max Response Time,Total Average Content Size
1603535373,20,GET,/xxxxx/,1.000000,0.000000,5,6,6,6,8,9,9,9,9,9,9,16,0,4.11685699998543,5.413748562499876,4.11685699998543,9.385663000045952,14265.0

Create a command in Python below and load it. You may not be familiar with dimensions, but in short, you can think of them as attribute information for classification. This time, I defined HTTP resources and methods as attributes.

import sys
import csv
import time
import boto3
import psutil

from botocore.config import Config

FILENAME = sys.argv[1]

DATABASE_NAME = "xxxxx-test-timestream"
TABLE_NAME = "xxxxx-test-table"

def write_records(records):
    try:
        result = write_client.write_records(DatabaseName=DATABASE_NAME,
                                            TableName=TABLE_NAME,
                                            Records=records,
                                            CommonAttributes={})
        status = result['ResponseMetadata']['HTTPStatusCode']
        print("Processed %d records.WriteRecords Status: %s" %
              (len(records), status))
    except Exception as err:
        print("Error:", err)

if __name__ == '__main__':

    session = boto3.Session()
    write_client = session.client('timestream-write', config=Config(
        read_timeout=20, max_pool_connections=5000, retries={'max_attempts': 10}))
    query_client = session.client('timestream-query')

    with open(FILENAME) as f:
        reader = csv.reader(f, quoting=csv.QUOTE_NONE)

        for csv_record in reader:
            if csv_record[0] == 'Timestamp' or csv_record[3] == 'Aggregated':
                continue

            ts_records = []

            ts_columns = [
                { 'MeasureName': 'Requests/s',                  'MeasureValue': csv_record[4] },
                { 'MeasureName': '95Percentile Response Time',  'MeasureValue': csv_record[10] },
                { 'MeasureName': 'Total Median Response Time',  'MeasureValue': csv_record[18] },
                { 'MeasureName': 'Total Average Response Time', 'MeasureValue': csv_record[19] },
            ]

            for ts_column in ts_columns:
                ts_records.append ({
                    'Time': str(int(csv_record[0]) * 1000),
                    'Dimensions': [ {'Name': 'resource', 'Value': csv_record[3]}, {'Name': 'method', 'Value': csv_record[2]} ],
                    'MeasureName': ts_column['MeasureName'],
                    'MeasureValue': ts_column['MeasureValue'],
                    'MeasureValueType': 'DOUBLE'
                })

            write_records(ts_records)

However, it's a feature that has just been released to the public, so some people may have an older version of boto3.

$ pip list -o

So, let's check if boto3 is Latest.

Package               Version  Latest     Type
--------------------- -------- ---------- -----
boto3                 1.13.26  1.16.4     wheel

Update with pip with `` `-U```.

$ pip install -U boto3

Also, use aws configure to point the default region to the region where the database was created with ↑.

If psutil is not included, install it as follows.

$ yum install python3-devel
$ pip3 install psutil

I think that it will be fixed soon, but as of October 25, 2020, the command name is wrong in the official blog of ↑, so if you believe in the blog and pip3, you will not be able to install it and you will feel sad.

By the way, was it possible to load the data safely?

Issue a query

If you select "Query Editor" from the menu on the left, the following screen will be displayed, so let's execute SQL while narrowing down the attributes to the text. I want to know the average response time of GET requests in / xxxxx /!

image.png

When I executed it, only the information I wanted was extracted!

キャプチャ10.png

To get this as raw data, get it again with CLI or boto3. It is quite troublesome because a page nator is required. In the first place, it is easy to use pandas for a small amount, but in the actual usage scene, the information collected from thousands of servers etc. can be quickly retrieved, so with the amount of information that can be formatted locally with pandas There shouldn't be. The point that it can be monitored in real time in combination with Grafana is the true value.

Recommended Posts

Try running Amazon Timestream
Try running Pyston 0.1
Try running CNN with ChainerRL
Try running Python with Try Jupyter
Try running Amazon Linux 2 on-premises (VM on your local PC).
Try running tensorflow on Docker + anaconda
Try installing OpenAM on Amazon Linux
Try running Jupyter with VS Code
Try running Jupyter Notebook on Mac
Try using Amazon DynamoDB from Python
Try running PlaidML image judgment on Mac
Try running Kobuki's 3D simulator on ROS
Try running a "newly infected number visualization app" using Streamlit on Amazon EC2