Continuing from the last time, I will write a story using Diamond. This time, I will make my own Diamond collector with a function to collect the flow rate of tweets and visualize it with Graphite + Grafana.
If you don't know about Diamond, Graphite, Grafana, please refer to the following page. I played with Diamond, a metrics collection tool
First, write the directory where you want to put the collector in diamond.conf.
diamond.conf
# Directory to load collector modules from
collectors_path = /usr/share/diamond/collectors/
For example, if you install it normally with ubuntu, it will look like the above. In this case, the collector located in / usr / share / diamond / collectors is referenced. In the example introduced this time, the collector was moved to home and the path was set as follows.
diamond.conf
# Directory to load collector modules from
collectors_path = /home/risuo/local/projects/risuo-diamond-collecter/collectors
Once you have decided on the collector directory, create a directory called "twitter" and a file called "twitter.py". The tree is as follows.
-[9432]% pwd
/home/risuo/local/projects/risuo-diamond-collecter/collectors
-[9433]% tree twitter
twitter
├── twitter.py
└── twitter.pyc
This time, I created a collector with the following functions
Since Graphite was used as before, the data flow is TwitterCollector-> GraphiteHandler-> Graphite.
I wrote it like this. https://gist.github.com/risuoku/324ba24a8f5ed8de1ff8
Basically, you're implementing a collect method that collects metrics and publishes them (the method that passes metrics to the handler). Also, in the above implementation, we use a self-made tool (stapi) to wrap tweepy and artificially extend the upper limit of API limit. stapi: https://github.com/risuoku/stapi
We are collecting 7 public lists.
Since it is hard to see, for example, enlarge the lower left.
Looking at the span of one day, it looks like this
Let's take a look at Graphite's whisper file directly.
-[21466]% whisper-fetch.py total6.wsp --until=1406654580 | tail -20
1406653440 2.000000
1406653500 3.000000
1406653560 6.000000
1406653620 3.000000
1406653680 2.000000
1406653740 5.000000
1406653800 4.000000
1406653860 3.000000
1406653920 3.000000
1406653980 4.000000
1406654040 0.000000
1406654100 5.000000
1406654160 2.000000
1406654220 2.000000
1406654280 3.000000
1406654340 3.000000
1406654400 4.000000
1406654460 5.000000
1406654520 3.000000
1406654580 4.000000
The left column is the unix time and the right column is the number of tweets. You can see that it is taken every 60 seconds.
It's the same as the daily graph shown earlier, but I'll paste it again.
You can see that. Very normal results were obtained.
Let's compare the amount of tweets with weekdays including Saturdays and Sundays.
Looking at the graph below, do you notice anything?
If you look closely, it hits at once around 4 am, when the flow rate should normally be quite small. In fact, this list is a collection of people interested in programming contests. When I looked it up, it seems that there was an event called MemSQL announces the start [c] up at 2: 00-4: 30 (JST) on the 28th. is. The holding time is here It's easy to see. After the programming contest was over, I wanted to write various impressions, and even though it was early in the morning, the flow of tweets increased.
However, please note that if the number of samples is small as in this example, it may change rapidly for no special reason.
Real-time aggregation and visualization of twitter data with InfluxDB, Grafana and fluentd It combines fluent-plugin-twitter, which is a fluentd plugin for sampling from twitter streaming api, and InfluxDB, Grafana.
I created a Diamond collector to measure the flow rate of the timeline of the list, and tried to collect and visualize the data. This time I introduced a simple prototype, but basically anything can be collected as long as it is time series data, so I will try various things and play with it.
Recommended Posts