[PYTHON] Visualize the flow rate of tweets with Diamond + Graphite + Grafana

Introduction

Continuing from the last time, I will write a story using Diamond. This time, I will make my own Diamond collector with a function to collect the flow rate of tweets and visualize it with Graphite + Grafana.

If you don't know about Diamond, Graphite, Grafana, please refer to the following page. I played with Diamond, a metrics collection tool

Make your own collector

Preparation

First, write the directory where you want to put the collector in diamond.conf.

diamond.conf


# Directory to load collector modules from
collectors_path = /usr/share/diamond/collectors/

For example, if you install it normally with ubuntu, it will look like the above. In this case, the collector located in / usr / share / diamond / collectors is referenced. In the example introduced this time, the collector was moved to home and the path was set as follows.

diamond.conf


# Directory to load collector modules from
collectors_path = /home/risuo/local/projects/risuo-diamond-collecter/collectors

Once you have decided on the collector directory, create a directory called "twitter" and a file called "twitter.py". The tree is as follows.

-[9432]% pwd
/home/risuo/local/projects/risuo-diamond-collecter/collectors
-[9433]% tree twitter
twitter
├── twitter.py
└── twitter.pyc

Thing you want to do

This time, I created a collector with the following functions

Since Graphite was used as before, the data flow is TwitterCollector-> GraphiteHandler-> Graphite.

Implementation

I wrote it like this. https://gist.github.com/risuoku/324ba24a8f5ed8de1ff8

Basically, you're implementing a collect method that collects metrics and publishes them (the method that passes metrics to the handler). Also, in the above implementation, we use a self-made tool (stapi) to wrap tweepy and artificially extend the upper limit of API limit. stapi: https://github.com/risuoku/stapi

What was made

We are collecting 7 public lists.

View in Grafana

grafana_3.png

Since it is hard to see, for example, enlarge the lower left. grafana_4.png

Looking at the span of one day, it looks like this grafana_5.png

View whisper file

Let's take a look at Graphite's whisper file directly.

-[21466]% whisper-fetch.py total6.wsp --until=1406654580 | tail -20
1406653440      2.000000
1406653500      3.000000
1406653560      6.000000
1406653620      3.000000
1406653680      2.000000
1406653740      5.000000
1406653800      4.000000
1406653860      3.000000
1406653920      3.000000
1406653980      4.000000
1406654040      0.000000
1406654100      5.000000
1406654160      2.000000
1406654220      2.000000
1406654280      3.000000
1406654340      3.000000
1406654400      4.000000
1406654460      5.000000
1406654520      3.000000
1406654580      4.000000

The left column is the unix time and the right column is the number of tweets. You can see that it is taken every 60 seconds.

Try to do something like analysis

Examine the difference in the amount of tweets depending on the time of day

grafana_5.png It's the same as the daily graph shown earlier, but I'll paste it again.

You can see that. Very normal results were obtained.

Let's compare the amount of tweets with weekdays including Saturdays and Sundays.

grafana_6.png

grafana_7.png

Find a point where the amount of tweets is special

Looking at the graph below, do you notice anything?

grafana_8.png

If you look closely, it hits at once around 4 am, when the flow rate should normally be quite small. In fact, this list is a collection of people interested in programming contests. When I looked it up, it seems that there was an event called MemSQL announces the start [c] up at 2: 00-4: 30 (JST) on the 28th. is. The holding time is here It's easy to see. After the programming contest was over, I wanted to write various impressions, and even though it was early in the morning, the flow of tweets increased.

grafana_9.png

However, please note that if the number of samples is small as in this example, it may change rapidly for no special reason.

Similar story

Real-time aggregation and visualization of twitter data with InfluxDB, Grafana and fluentd It combines fluent-plugin-twitter, which is a fluentd plugin for sampling from twitter streaming api, and InfluxDB, Grafana.

Summary / impression

I created a Diamond collector to measure the flow rate of the timeline of the list, and tried to collect and visualize the data. This time I introduced a simple prototype, but basically anything can be collected as long as it is time series data, so I will try various things and play with it.

Recommended Posts

Visualize the flow rate of tweets with Diamond + Graphite + Grafana
Visualize the behavior of the sorting algorithm with matplotlib
Visualize the range of interpolation and extrapolation with python
Visualize the characteristic vocabulary of a document with D3.js
Summary of the basic flow of machine learning with Python
Visualize the appreciation status of art works with OpenCV
Visualize the results of decision trees performed with Python scikit-learn
Follow the communication flow of Docker's bridge connection with nftables
Let's visualize the number of people infected with coronavirus with matplotlib
Visualize the response status of the census 2020
Bookkeeping Learned with Python-The Flow of Bookkeeping-
I tried to visualize the age group and rate distribution of Atcoder
I tried to visualize the text of the novel "Weathering with You" with WordCloud
Align the size of the colorbar with matplotlib
Visualize the boundary values of the multi-layer perceptron
Check the existence of the file with python
Visualize the effects of deep learning / regularization
Sentiment analysis of tweets with deep learning
The third night of the loop with for
The second night of the loop with for
Visualize the export data of Piyo log
Get lots of your tweets with Tweepy
Count the number of characters with echo
Let's simulate the transition of infection rate with respect to population density with python