[PYTHON] [blackbird-kinesis-stream] Monitoring AWS Kinesis Stream

This Plugin (https://github.com/Vagrants/blackbird-Kinesis-Stream) gets CloudWatch Metrics for Kinesis Stream.

What Metrics does this plugin get?

The Metric Name column is MetricName.Statistics, which is the metric name. Which way to take it (Sum or Avg). Unit is a unit (bytes, ms, etc.).

Metric Name Unit Detail
PutRecord.Bytes Bytes Number of bytes of items in Stream
PutRecord.Latency milliseconds Latency when put in Stream(PutRecord API response time)
PutRecord.Success Count Number of successful PutRecord APIs
GetRecords.Bytes Bytes Number of bytes retrieved from Stream
GetRecords.IteratorAgeMilliseconds milliseconds -
GetRecords.Latency milliseconds Latency when acquired from Stream(GetRecords API response time)
GetRecords.Success Count GetRecords API Success Count

Zabbix Template

Items

This item is the above CloudWatch Metrics calculated per second.

Triggers

//トリガーの作成はここだけの話超苦労しました。

Due to the characteristics of Kinesis Stream (unlike Queue, Data does not disappear even if it is taken out), the size of the entire Stream does not make much sense, so if the difference between PutRecord and GetRecords is too small (or too large) than the previous time, it will trigger I tried to raise it, but due to the characteristics that the consumer side acquires at once with GerRecords, it flaps quite a bit. Therefore, when I first introduced it, I was honestly monitoring it (at the experimental stage, I was flying only to my mobile phone, so I was crazy).

Therefore, instead of simple difference monitoring, subtract the average value of 3 times (in short span) from the average value of 10 times (in long span) (the number of times to average is an example). I changed it so that an alert is thrown when the difference exceeds 25%.

Now you can even out the flap values and see the averaged difference. Only when the flow rate to the Stream (or the flow rate taken out) increases or decreases extremely, you can recognize that Oh, something happened.

Of course, depending on the characteristics of the application, I think that it is necessary to lengthen or shorten the average value of this long span and short span, so I set the span and threshold value in MACRO respectively.

Is the trigger. You can specify info, average, and high thresholds, respectively, so change chat, email, and notification integration.

Graphs

The above average difference monitoring is difficult to understand in words, but it should be very easy to understand in graphs.

Graph of raw values of GetRecords.Bytes

スクリーンショット_2014-12-11_2_06_20.png

It's quite jagged. If you monitor this with a simple difference from the previous comparison, you will get a lot of alerts.

GetRecords.Bytes average diff graph

スクリーンショット_2014-12-11_2_07_15.png

This is a smoothed graph, but I think it looks quite calm. If you actually look at it with a slightly longer span and compare it with the peak time of the application, it seems that the flow rate will be like this.

As another graph

there is.

How to Install

I have uploaded the RPM to usual place, so this procedure Create a repo file from //qiita.com/JumpeiArashi/items/849281083b6c7888f25d#case-of-using-yum),

yum install blackbird-kinesis-stream --enablerepo=blackbird

Please do it.

Please wait for a while as pip will be updated soon > <

Isn't there monitoring for each Shard ??

In CloudWatch, you can get Metric for each Shard, so I really want to implement it (I know where the Key depends), but unless the application logic is wrong even with the difference flow rate, something is happening. I think it is possible to notice it.

However, in the future, we plan to monitor each Shard.

Finally

I think Put's Throughput has improved a lot since the Put Records API was implemented the other day, but since it is not yet implemented on the plugin side, I would like to get it as soon as possible and add it to Zabbix Template.

Recommended Posts

[blackbird-kinesis-stream] Monitoring AWS Kinesis Stream
[blackbird-dynamodb] Monitoring AWS DynamoDB
[blackbird-rds] Monitoring AWS RDS
[blackbird-sqs] Monitoring AWS SQS
[blackbird-elb] Monitoring AWS ElasticLoadBalancing
[blackbird-elasticache] Monitoring AWS ElastiCache (redis)
[blackbird-aws-service-limits] Monitoring AWS Service Limits
Run and see AWS Kinesis Firehose