It's been about half a year since I started posting to Qiita, mainly for articles related to statistics, machine learning, and data analysis. Let's look back on the articles so far while using the Qiita API. (Hereafter calculated from the data as of August 10, 2015)

We'll look at the data first, then the Python code that generated those contents, and how to use the Qiita API from Python.

1. View the data

By stock number of posted articles

The top 5 are 73%. Popular articles are biased ... I personally like "The meaning of division of fractions understood by pizza" at the bottom, but it is not stocked at all. : sweat_smile:

Stock quantity	Percentage(%)	Accumulation(%)	title
750	28.1	28.1	[Machine learning] I will explain while trying the deep learning framework Chainer.
595	22.3	50.4	[Mathematics] Let's visualize what are eigenvalues and eigenvectors
318	11.9	62.3	[Statistics] First "standard deviation" (to avoid frustration with statistics)
163	6.1	68.4	Get a large amount of Starbucks Twitter data with python and try data analysis Part 1
124	4.6	73.1	[Deep learning] Try Autoencoder with Chainer and visualize the result.
82	3.1	76.1	[Update] Explain what the stochastic gradient descent method is by running it in Python.
55	2.1	78.2	Get a large amount of Starbucks Twitter data with python and try data analysis Part 2
52	1.9	80.1	Get a large amount of Starbucks Twitter data with python and try data analysis Part 3
50	1.9	82.0	[Statistics] Understand what an ROC curve is by animation.
45	1.7	83.7	Starbucks Twitter Data Location Visualization and Analysis
44	1.6	85.4	Try rudimentary sentiment analysis on Twitter Stream API data.
44	1.6	87.0	Principal component analysis Analyze handwritten numbers using PCA. Part 1
40	1.5	88.5	Understanding the meaning of complex and bizarre normal distribution formulas
31	1.2	89.7	Playing handwritten numbers with python Part 1
28	1.0	90.7	[Statistics] Generalized linear mixed model(GLMM)Visualization to understand.
28	1.0	91.8	[Statistics] Let's visualize the relationship between the normal distribution and the chi-square distribution.
24	0.9	92.7	Explanation of the concept of regression analysis using Python Part 1
21	0.8	93.4	Introduction to Graph Database Neo4j in Python for Beginners(For Mac OS X)
20	0.7	94.2	[Statistics] Grasp the image of the central limit theorem with a graph
20	0.7	94.9	Visualize the frequency of word occurrences in sentences with Word Cloud.[Python]
20	0.7	95.7	[Machine learning] k-nearest neighbor method(k-nearest neighbor method)Write in python by yourself and recognize handwritten numbers
17	0.6	96.3	Get the world's 100 most influential tech Twitter user information in python.
16	0.6	96.9	[Statistics] [R] Try using quantile regression.
15	0.6	97.5	Play handwritten numbers with python Part 2 (identify)
14	0.5	98.0	Explanation of the concept of regression analysis using Python Extra 1
12	0.4	98.5	[Statistics] Q-Understand the mechanism of Q-plot with animation.
11	0.4	98.9	Explanation of the concept of regression analysis using python Part 2
11	0.4	99.3	Principal component analysis Analyze handwritten numbers using PCA. Part 2
8	0.3	99.6	[python]Random number generation memorandum
6	0.2	99.8	Preferences to generate animated GIFs from Python on Mac
5	0.2	100.0	The meaning of fractional division understood in pizza

Categorized by category

I write articles in the major categories of "machine learning," "statistics," "mathematics," "data analysis," and "others."

Machine learning
[Machine learning] I will explain while trying the deep learning framework Chainer.
[Deep learning] Try Autoencoder with Chainer and visualize the result.
[Update] Explain what the stochastic gradient descent method is by running it in Python.
Principal component analysis Analyze handwritten numbers using PCA. Part 1
Playing handwritten numbers with python Part 1
[Machine learning] k-nearest neighbor method(k-nearest neighbor method)Write in python by yourself and recognize handwritten numbers
Play handwritten numbers with python Part 2 (identify)
Principal component analysis Analyze handwritten numbers using PCA. Part 2

statistics
[Statistics] First "standard deviation" (to avoid frustration with statistics)
[Statistics] Understand what an ROC curve is by animation.
Understanding the meaning of complex and bizarre normal distribution formulas
[Statistics] Generalized linear mixed model(GLMM)Visualization to understand.
[Statistics] Let's visualize the relationship between the normal distribution and the chi-square distribution.
Explanation of the concept of regression analysis using Python Part 1
[Statistics] Grasp the image of the central limit theorem with a graph
[Statistics] [R] Try using quantile regression.
Explanation of the concept of regression analysis using Python Extra 1
[Statistics] Q-Understand the mechanism of Q-plot with animation.
Explanation of the concept of regression analysis using python Part 2
[python]Random number generation memorandum

Math
[Mathematics] Let's visualize what are eigenvalues and eigenvectors
The meaning of fractional division understood in pizza

Data analysis
Get a large amount of Starbucks Twitter data with python and try data analysis Part 1
Get a large amount of Starbucks Twitter data with python and try data analysis Part 2
Get a large amount of Starbucks Twitter data with python and try data analysis Part 3
Starbucks Twitter Data Location Visualization and Analysis
Try rudimentary sentiment analysis on Twitter Stream API data.

Other
Introduction to Graph Database Neo4j in Python for Beginners(For Mac OS X)
Visualize the frequency of word occurrences in sentences with Word Cloud.[Python]
Get the world's 100 most influential tech Twitter user information in python.
Preferences to generate animated GIFs from Python on Mac

By tag

Let's look at each tag. Since I'm basically using Python, the top number of articles is Python. Looking at the stock / article ratio, "Deep Learning", "Deep Learning", and "Chainer" are overwhelmingly high. You can see the excitement of deep learning these days.

"Mathematics" and "machine learning" also seem to have a relatively high stock rate.

tag	Number of articles	Stock quantity	stock/Article ratio
Python	30	2664	88.8
statistics	22	1589	72.2
statistics	17	1274	74.9
Machine learning	9	1127	125.2
Twitter	6	376	62.7
Natural language processing	6	379	63.2
Math	6	1054	175.7
matplotlib	5	63	12.6
MongoDB	4	314	78.5
MachineLearning	4	148	37.0
DeepLearning	2	874	437.0
statistics	2	35	17.5
scikit-learn	2	55	27.5
Deep learning	2	874	437.0
Scraping	2	37	18.5
Chainer	2	874	437.0
Database	1	21	21.0
Data visualization	1	45	45.0
Statistical test	1	12	12.0
Way of thinking	1	5	5.0
Pattern recognition	1	50	50.0
Note	1	5	5.0
R	1	16	16.0
Data analysis	1	40	40.0
Visualization	1	20	20.0
math	1	82	82.0
numpy	1	8	8.0
Graph database	1	21	21.0
BeautifulSoup	1	17	17.0
Statistical modeling	1	28	28.0
neo4j	1	21	21.0
Introduction to Statistics	1	11	11.0

Looking at the graph, it looks like this.

Stock user

I imagined that the same person would stock a lot, but it seems that there are quite a lot of people at first glance. The table below shows the regulars who are well stocked. Thank you: relaxed:

Ranking	Stock quantity
1	22
2	18
3	13
4	10
5	10
6	10
7	9
8	9
9	9
10	9
11	8
12	8
13	8
14	8
15	8
16	8
17	7
18	7
19	7
20	7

It is a graph of the top 150 users with a large number of stocks. The number of unique users was 1771.

This is a histogram of the number of stocks. It is closer to 1 to 5 stocks than I imagined. Low repeat rate ...: weary: In the future, I will do my best to write articles that will be repeated!

2. Explanation of Python code

Get data from Qiita API

The access token is Qiita [Settings] → [Applications] → [Issue new token] It can be issued at. Please set the acquired token in the following''.

%matplotlib inline
import requests
import json, sys
from collections import defaultdict
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')

key = '<Access token>'
auth_str = 'Bearer %s'%(key)
headers = {'Authorization': auth_str}
cnt = 0
data_list = []
users = defaultdict(int)

Define a get_stockers function to get the stock user and the number of stocks.

# -------------------Get the number of stocks for each article-----------------------#
def get_stockers(_id):
    global headers
    url = 'https://qiita.com/api/v2/items/{}/stockers'.format(_id)
    cnt = 0
    _sum = 0
    while True:
        cnt += 1
        payload = {'page': cnt, 'per_page': 20}
        res = requests.get(url, params=payload, headers=headers)
        data = res.json()
        for d in data:
            users[d['id']] += 1
        num = len(data)
        if num == 0:
            break
        _sum += num
        
    return _sum

In the loop below, get the set of articles you posted, get the stock user information associated with it, and keep it in the list.

# -------------------Article information acquisition-----------------------#
url = 'https://qiita.com/api/v2/authenticated_user/items'

while True:
    cnt += 1
    sys.stdout.write("{}, ".format(cnt))
    payload = {'page': cnt, 'per_page': 20}
    res = requests.get(url, params=payload, headers=headers)
    data = res.json()
    if len(data) == 0:
        break
    data_list.extend(data)

res = []

Extract necessary information from the acquired data and organize it. Also, private articles (limited shared posts) are excluded.

# -------------------Data formatting-----------------------#
for i, d in enumerate(data_list):
    sys.stdout.write("{}, ".format(i))

    #Excludes private articles
    if d['private'] == True:
        continue
        
    article_info = {}
    for k in ['id', 'title', 'private', 'created_at', 'tags', 'url']:
        article_info[k] = d[k]
    
    article_info['stock'] = get_stockers(d['id'])
    res.append(article_info)

Below, the article set, the number of stocks, and the ratio are output in a form that can be pasted as a markdown table as it is.

sum_of_stocks = np.sum([r['stock'] for r in res]).astype(np.float32)

cum = 0
print "|Stock quantity|Percentage(%)|Accumulation(%)|title|"
print "|:----------:|:----------:|:----------:|:----------|"
for i in np.argsort([r['stock'] for r in res])[::-1]:
    r = res[i]
    ratio = r['stock']/sum_of_stocks*100
    cum += ratio
    print "|{0}|{1:.1f}|{2:.1f}|[{3}]({4})|".format(r['stock'],ratio,cum,r['title'].encode('utf-8'),r['url'])

Aggregate around tags.

#Tag aggregation
tag_cnt = defaultdict(int)
for r in res:
    for t in r['tags']:
        tag_cnt[t['name']] += 1

#Number of stocks by tag
tag_stock_cnt = defaultdict(int)
for t in tag_cnt.keys():
    for r in res:
        for _t in r['tags']:
            if t == _t['name']:
                tag_stock_cnt[t] += r['stock']
tag_stock_dict = {}
for t, cnt in tag_stock_cnt.items():
    tag_stock_dict[t] = cnt

#Processed so that it can be placed in a DataFrame
tag_list = []
ind_list = []
for k, t in tag_cnt.items():
    ind_list.append(k)
    tag_list.append((t , tag_stock_dict[k]))

#Data frame generation
tag_list = np.array(tag_list)
df = pd.DataFrame(tag_list, index=ind_list, columns=['cnt', 'stocks'])

n = float(len(tag_cnt))
df['cnt_ratio'] = df['cnt']/n
df['stock_ratio'] = df['stocks']/sum_of_stocks

#Display of stock quantity and stock ratio by tag
df_tag = df.sort(columns='cnt', ascending=False)

print "|tag|Number of articles|Stock quantity|stock/Article ratio|"
print "|:----------:|:----------:|:----------:|:----------:|"
for d in df_tag.iterrows():
    print "|[{0}](http://qiita.com/tags/{0})|{1}|{2}|{3:.1f}|".format(d[0].encode('utf-8'),int(d[1][0]),int(d[1][1]),d[1][1]/d[1][0])



#graph display
df[['cnt_ratio','stock_ratio']].sort(columns='cnt_ratio', ascending=False).plot(kind="bar", figsize=(17, 8), alpha=0.7, 
                                title="The ratio of article and stocks for each tag.")

Next, the function is aggregated and displayed to the user.

#User aggregation
id_list = []
cnt_list = []
for _id, cnt in users.items():
    id_list.append((_id, cnt))

df = pd.DataFrame(id_list, columns=["id","cnt"])

#Top 20 people display
print "|Ranking|Stock quantity|"
print "|:----------:|:----------:|"
for i, d in enumerate(df.sort(columns="cnt", ascending=False)['cnt'][:20]):
    print "| {} | {} |".format(i+1, d)


#Bar chart by user with the most stock
df.sort(columns="cnt", ascending=False)[:150].plot(kind="bar", figsize=(17, 8), alpha=0.6, xticks=[], 
                                                   title="The number of stocks from 1 user.", width=1, color="blue")


#Histogram of stock numbers
df['cnt'].plot(kind="hist", figsize=(13, 10), alpha=0.7, color="Green", bins=25, xlim=(1,26),
              title="Histgram of stocked users.")

[PYTHON] [Qiita API] [Statistics • Machine learning] I tried to summarize and analyze the articles posted so far.