[PYTHON] The story of visualizing popular Qiita tags with Bar Chart Race

What about this article?

In the previous article [^ 1], I visualized Qiita's popular tags on a monthly Bar Chart Race, so I will post the procedure.

1. Get information using Qiita API

As mentioned in the previous article, we basically borrow the wisdom of our predecessors [^ 2].

This method retrieves articles written within a half-month and tries to aggregate all periods by shifting the periods. But,

query = "&query=created:>" + start_date  + "+created:<" + end_date

To start_date = ["2018-01-15","2018-01-31",...] end_date = ["2018-01-31","2018-02-15",...]

Because it is, the boundary is not included. Therefore, I did the following.

query = "&query=created:>" + start_date  + "+created:<=" + end_date

2. Process using pandas

As follows. See comments for details.

import datetime
from dateutil.relativedelta import relativedelta
import copy

# 1.Load all result files created in
df_all = pd.read_csv("results/summary.csv")

#Start date and time
ref_date = datetime.date(2011,9,1)

# created_Sort by at
df_all = df_all.sort_values("created_at")

#Extract only tag information and date information
tags_list = list(df_all["tags_str"])
date_list = list(df_all["created_at"])
#Convert to a type that can use relativedelta etc.
date_list = [pd.to_datetime(one) for one in date_list]

# key:Tag name, value:Number of times
tags_dict =dict()
#Updated every time the first year (2011) and the year to be aggregated change
y = date_list[0].year
#First month(9), Updated every time the month to be aggregated changes
m = date_list[0].month
#For storing results
ref_date = datetime.date(y,m,1)

#List for storing intermediate results (sum) in each month
monthly_result = []
#Monthly storage list
month = []

for i,(one_tags, one_date) in tqdm(enumerate(zip(tags_list,date_list))):
    try:
        #List comma-separated text
        tags = one_tags.split(",")
    except AttributeError:
        #Sometimes NaN is included, so at that time continue (when tag is not set?)
        continue
    # tags_If you look at the dict and the tag is already in+1, otherwise register in dict and store 1
    for one_tag in tags:
        try:
            tags_dict[one_tag] += 1
        except KeyError:
            tags_dict[one_tag] = 1
    
    #Processing when the month changes
    if one_date.year == y and one_date.month == m:
        continue
    else:
        # month, monthly_Store the date at that time and the dict up to that point in result
        month.append(ref_date)
        monthly_result.append(copy.deepcopy(tags_dict))
        ref_date += relativedelta(months=1)
        y = ref_date.year
        m = ref_date.month
    
#Store last state on exit
month.append(ref_date)
monthly_result.append(copy.deepcopy(tags_dict))

#For each month's dict, register tags that have not been posted by that month in the dict and store 0
for one in monthly_result:
    ref_keys = one.keys()
    for one_tag in tags_dict:
        if not one_tag in ref_keys:
            one[one_tag] = 0
    
#Molding
monthly_result_num = []
for one_dict in monthly_result:
    #From dict to list to sort
    tmp_list = [one for one in one_dict.items()]
    #Sort by name
    tmp_list = sorted(tmp_list, key=lambda x:x[0])
    #Store only the number of times
    monthly_result_num.append([one[1] for one in tmp_list ])
    
#Temporarily store the tag name in the value of DataFrame
df_align = pd.DataFrame({"tags":sorted(ref_keys)})
#Store the cumulative value of the number of tag registrations up to each month in the DataFrame
for one_date,one_nums in zip(month,monthly_result_num):
    df_align[one_date.strftime("%Y-%m")] = one_nums
#Export to csv with tag name as index
df_align.set_index('tags').to_csv("all_result.csv")

3. Visualize using flourish bar chart race

https://app.flourish.studio/ Upload the csv that came out to the bar chart race of. Now you can visualize it! !!

Recommended Posts

The story of visualizing popular Qiita tags with Bar Chart Race
The story of implementing the popular Facebook Messenger Bot with python
A story that visualizes the present of Qiita with Qiita API + Elasticsearch + Kibana
The story of doing deep learning with TPU
The story of stopping the production service with the hostname command
The story of replacing Nvidia GTX 1650 with Linux Mint 20.1.
The story of sharing the pyenv environment with multiple users
The story of sys.path.append ()
The story of displaying images with OpenCV or PIL (only)
The story of rubyist struggling with python :: Dict data with pycall
The story of making a question box bot with discord.py
The story of building Zabbix 4.4
[Apache] The story of prefork
bar chart race in python
I tried scraping the ranking of Qiita Advent Calendar with Python
The story of not being able to run pygame with pycharm
Get the number of PVs of Qiita articles you posted with API
March 14th is Pi Day. The story of calculating pi with python
The story of making a standard driver for db with python.
The story of outputting the planetarium master in pdf format with Pycairo
The story of making a module that skips mail with python
The story of Python and the story of NaN
Get the number of views of Qiita
The story of participating in AtCoder
The story of remounting the application server
The story of writing a program
The story of a Parking Sensor in 10 minutes with GrovePi + Starter Kit
The story of making a university 100 yen breakfast LINE bot with Python
The story of having a hard time introducing OpenCV with M1 MAC
I tried to get the authentication code of Qiita API with Python.
The story of making a sound camera with Touch Designer and ReSpeaker
Get the number of articles accessed and likes with Qiita API + Python
The story of trying to push SSH_AUTH_SOCK obsolete on screen with LD_PRELOAD
The story of using mysqlclient because PyMySQL cannot be used with Django 2.2