[PYTHON] [blackbird-dynamodb] Monitoring AWS DynamoDB

This Plugins gets various DynamoDB Metrics.

What metric does it get?

You can get Metrics in each Table and Response time in each API.

Table Metrics

First of all, from Metrics of each table. The Statistics in the table below shows what kind of formula to get (Sum is the total value per unit time, Avg is the average value per unit time).

Metric Name Statistics Detail
UserErrors Sum Client error
SystemErros Sum AWS side error(I want to think that there will be few)
ThrottleRequests Sum Number of Throttled Requests that have reached the Provisioned Capacity Units limit
ReadThrottleEvents Sum Read requests that have reached the upper limit of Provisioned Capacity Units
WriteThrottleEvents Sum Write requests that have reached the upper limit of Provisioned Capacity Units
ProvisionedReadCapacityUnits Maximum Number of ProvisionedReadCapacityUnits specified in Table
ProvisionedWriteCapacityUnits Maximum Number of ProvisionedWriteCapacityUnits specified in Table
ConsumedReadCapacityUnits Maximum Maximum number of Read Capacity Units consumed per unit time
ConsumedReadCapacityUnits Average Average number of Read Capacity Units consumed per unit time
ConsumedWriteCapacityUnits Maximum Maximum number of WriteCapacityUnits consumed per unit time
ConsumedWriteCapacityUnits Average Average number of WriteCapacityUnits consumed per unit time

about UserErrors

I think it feels like UserErrors, but it's an error when calling DynamoDB on the SDK side. Specifically, when the 4XX status code is returned.

about Each CapacityUnits

DynamoDB is a distributed DB with multiple shards on the back side. However, the CapacityUnits you set are assigned to the entire Table. It's easy to think that shard is not a problem because it divides and scales out in proportion to the size of Capacity Units. I think it's great that you don't actually have to worry about scale-out or load.

However, since CapacityUnits is assigned to a Table, each shard will be the value divided by the number of shards. I'm getting a LimitExceededException even though ConsumedReadCapacityUnits hasn't reached the limit at all! !! !! !! That can happen.

This is because (well, if you read the documentation properly, it's written quite well, but ...) the hash key is widely distributed and the value is hard, so hot shard is created. Once it happens, I think it's quite difficult to fix it, so I'd like to do it well at the design stage if possible.

Each Operation Metrics

You can get the Response Time when the API succeeds and the number of Items returned when issuing the range query. Be careful, for example, if the Scan API or Query API is returning too many Items. (If anything, it seems that you will notice that the latency of ELB has deteriorated first)

Regarding ResponseTime, there are the following,

For the number of acquired items

there is. Both values are designed to get Maximum and Average.

How to Install

Excuse meI haven't prepared pip and RPM yet, so I'll deal with it as soon as possible.

Configuration

The options are as follows.

Key Name Default Require Detail
region_name us-east-1 No AWS region name
aws_access_key_id - Yes AWS Acces Key ID
aws_secret_access_key - Yes AWS Secret Access Key
table_name - Yes Table name of DynamoDB
hostname - Yes Hostname on Zabbix(It's a good idea to make it first)
module - Yes Which plugin to use, so heredynamodbIs fixed at
ignore_metrics - No If you have a Metric that you don't want to get, please separate it with commas.
ignore_operations - No If there is an Operation Metric that you do not want to get, please separate it with commas.

about ignore_XXXXX parameter

If you don't want to get ignore_metrics (CloudWatch also costs money if you call API beyond the one month free tier), please separate them with commas like ʻignore_metrics = UserErrors, ConsumedWriteCapacityUnits`.

The same is true for ignore_operations, and if you don't use BatchWrite and Scan API, please write like ʻignore_operations = BatchWriteItem, Scan`!

Example

Sample config file

#The section name can be anything, but since it is the name of the Thread that is generated internally, it may be better not to overlap it with others. Since the Debug Log has a Thread Name, I use the DynamoDB Table name.
[ANYTHING_OK]

# AWS Information
region_name = ap-northeast-1
aws_access_key_id = XXXXXXXXXX
aws_secret_access_key = YYYYYYYYYY

#Please enter the Table name of DynamoDB
table_name = YOUR_DYNAMODB_TABLE_NAME

#Hostname on Zabbix. I'm worried about what to do if it is not tied to such a server due to restrictions on zabbix, but I create and use the same Hostname as the Table name.
hostname = HOSTNAME_ON_ZBX_SERVER

#It is fixed to dynamodb.
module = dynamodb

It looks like this!

Recommended Posts

[blackbird-dynamodb] Monitoring AWS DynamoDB
[blackbird-rds] Monitoring AWS RDS
[blackbird-sqs] Monitoring AWS SQS
[blackbird-elb] Monitoring AWS ElasticLoadBalancing
[blackbird-elasticache] Monitoring AWS ElastiCache (redis)
[blackbird-aws-service-limits] Monitoring AWS Service Limits
[blackbird-kinesis-stream] Monitoring AWS Kinesis Stream
Automatically update CSV files to AWS DynamoDB
AWS IoT device life and death monitoring