Introduction

Continuing from the last time, I will write a story using Diamond. This time, I will make my own Diamond collector with a function to collect the flow rate of tweets and visualize it with Graphite + Grafana.

If you don't know about Diamond, Graphite, Grafana, please refer to the following page. I played with Diamond, a metrics collection tool

Make your own collector

Preparation

First, write the directory where you want to put the collector in diamond.conf.

`diamond.conf`


# Directory to load collector modules from
collectors_path = /usr/share/diamond/collectors/

For example, if you install it normally with ubuntu, it will look like the above. In this case, the collector located in / usr / share / diamond / collectors is referenced. In the example introduced this time, the collector was moved to home and the path was set as follows.

`diamond.conf`


# Directory to load collector modules from
collectors_path = /home/risuo/local/projects/risuo-diamond-collecter/collectors

Once you have decided on the collector directory, create a directory called "twitter" and a file called "twitter.py". The tree is as follows.

-[9432]% pwd
/home/risuo/local/projects/risuo-diamond-collecter/collectors
-[9433]% tree twitter
twitter
├── twitter.py
└── twitter.pyc

Thing you want to do

This time, I created a collector with the following functions

Get the timeline of the specified list and pass the number of tweets per minute to the handler

Since Graphite was used as before, the data flow is TwitterCollector-> GraphiteHandler-> Graphite.

Implementation

I wrote it like this. https://gist.github.com/risuoku/324ba24a8f5ed8de1ff8

Basically, you're implementing a collect method that collects metrics and publishes them (the method that passes metrics to the handler). Also, in the above implementation, we use a self-made tool (stapi) to wrap tweepy and artificially extend the upper limit of API limit. stapi: https://github.com/risuoku/stapi

What was made

We are collecting 7 public lists.

View in Grafana

Seven graphs where only the green line can be seen are the data of each list, and the lower right is a stack of all seven
How to read the axis
7 graphs: The vertical axis is the number of tweets per minute, the horizontal axis is the time
Lower right graph: Vertical axis is the number of tweets per minute * 100 / Number of people in the list, horizontal axis is the time

Since it is hard to see, for example, enlarge the lower left.

Looking at the span of one day, it looks like this

View whisper file

Let's take a look at Graphite's whisper file directly.

-[21466]% whisper-fetch.py total6.wsp --until=1406654580 | tail -20
1406653440      2.000000
1406653500      3.000000
1406653560      6.000000
1406653620      3.000000
1406653680      2.000000
1406653740      5.000000
1406653800      4.000000
1406653860      3.000000
1406653920      3.000000
1406653980      4.000000
1406654040      0.000000
1406654100      5.000000
1406654160      2.000000
1406654220      2.000000
1406654280      3.000000
1406654340      3.000000
1406654400      4.000000
1406654460      5.000000
1406654520      3.000000
1406654580      4.000000

The left column is the unix time and the right column is the number of tweets. You can see that it is taken every 60 seconds.

Try to do something like analysis

Examine the difference in the amount of tweets depending on the time of day

It's the same as the daily graph shown earlier, but I'll paste it again.

Midnight-Early morning, the amount of tweets decreases considerably
A lot of tweets from 20:00 to 24:00
About 12 [tweets / min] at peak time

You can see that. Very normal results were obtained.

Let's compare the amount of tweets with weekdays including Saturdays and Sundays.

7 / 26-> Saturday, 7 / 27-> Sunday
It doesn't look much different from weekdays

Overall, there is not much difference between weekdays and Saturdays and Sundays

Find a point where the amount of tweets is special

Looking at the graph below, do you notice anything?

If you look closely, it hits at once around 4 am, when the flow rate should normally be quite small. In fact, this list is a collection of people interested in programming contests. When I looked it up, it seems that there was an event called MemSQL announces the start [c] up at 2: 00-4: 30 (JST) on the 28th. is. The holding time is here It's easy to see. After the programming contest was over, I wanted to write various impressions, and even though it was early in the morning, the flow of tweets increased.

It is increasing at a stretch around 4:30

However, please note that if the number of samples is small as in this example, it may change rapidly for no special reason.

Similar story

Real-time aggregation and visualization of twitter data with InfluxDB, Grafana and fluentd It combines fluent-plugin-twitter, which is a fluentd plugin for sampling from twitter streaming api, and InfluxDB, Grafana.

Summary / impression

I created a Diamond collector to measure the flow rate of the timeline of the list, and tried to collect and visualize the data. This time I introduced a simple prototype, but basically anything can be collected as long as it is time series data, so I will try various things and play with it.

[PYTHON] Visualize the flow rate of tweets with Diamond + Graphite + Grafana

Introduction

Make your own collector

Preparation

diamond.conf

diamond.conf

Thing you want to do

Implementation

What was made

View in Grafana

View whisper file

Try to do something like analysis

Examine the difference in the amount of tweets depending on the time of day

Find a point where the amount of tweets is special

Similar story

Summary / impression

`diamond.conf`

`diamond.conf`