[Python] Draw a Qiita tag relationship diagram with NetworkX

Introduction

I will explain how to use the Python library NetworkX using the creation of a relationship graph of tags attached to Qiita posts as an example. With NetworkX, you can draw a graph consisting of nodes and edges, as shown below. sample.png

Execution environment

Acquisition of original data

Qiita publishes an API to get posts, so you can easily get posts. Convert the data returned in JSON format to a Python dictionary with the following code. In the case of non-authentication, there is a limit of 100 articles per request and 60 times per hour. This time, we will target 100 * 60 = 6000 articles.

import requests
import json

items = []
params = {"page":1, "per_page":100}
for i in range(60):
    print("fetching... page " + str(i+1))
    params["page"] = i + 1
    res = requests.get("https://qiita.com/api/v2/items", params=params)
    items.extend(json.loads(res.text))

Data preparation

For the data acquired by API, only the tags are extracted and converted to the format of [[tag1, tag2], [tag3], ...].

tags_list = []
for item in items:
    tags = [tag["name"] for tag in item["tags"]]
    tags_list.append(tags)

Also, use collections.Counter to count the number of occurrences of tags. At this time, the multiple array is flattened with ʻitertools.chain.from_iterable (tags_list)`. If there are too many nodes, the figure will be messed up, so extract the top 50 tags.

import collections
import itertools

tag_count = collections.Counter(itertools.chain.from_iterable(tags_list)).most_common(50)

Use of NetworkX

From here, we will use NetworkX to create a graph.

Initialization and addition of nodes

Create a new graph with G = nx.Graph () and add nodes with tag names. In order to determine the size of the node when drawing later, the number of occurrences count is put in the attribute of the node.

import networkx as nx
G = nx.Graph()
G.add_nodes_from([(tag, {"count":count}) for tag,count in tag_count])

Add edge

If a post contains multiple tags, add edges to all combinations For example, if you have a post tagged with "Python, networkx, Qiita" like this article, Create edges between Python nodes and networkx nodes, between networkx and Qiita, and between Qiita and Python. If the edge already exists, increase the weight of the edge.

for tags in tags_list:
    for node0,node1 in itertools.combinations(tags, 2):
        if not G.has_node(node0) or not G.has_node(node1):
            continue
        if G.has_edge(node0, node1):
            G.edge[node0][node1]["weight"] += 1
        else:
            G.add_edge(node0, node1, {"weight":1})

Let's draw a graph

Let's draw a graph here.

%matplotlib inline
import matplotlib.pyplot as plt

plt.figure(figsize=(15,15))
pos = nx.spring_layout(G)
nx.draw_networkx(G,pos)

plt.axis("off")
plt.savefig("default.png ")
plt.show()

default.png

I have a graph that I don't understand.

Graph drawing adjustment

From here, we will make various adjustments to make the graph cleaner.

Edge pruning

Delete edges that appear less frequently.

for (u,v,d) in G.edges(data=True):
    if d["weight"] <= 4:
        G.remove_edge(u, v)

Repulsive force adjustment

In pos = nx.spring_layout (G), the position of the node is determined by the repulsive force between the nodes and the attractive force due to the size of the weight of the edge. The repulsive force between nodes can be set by specifying the argument k, and the larger the k, the closer the node arrangement is to a circle.

pos = nx.spring_layout(G, k=0.3)

Node size and Japanese display of node labels

The larger the count, the larger the size of the node circle. Also, in the figure output earlier, Japanese is not displayed as a rectangle, so set a font that can display Japanese.

node_size = [ d["count"]*20 for (n,d) in G.nodes(data=True)]
nx.draw_networkx_nodes(G, pos, node_color="w",alpha=0.6, node_size=node_size)
nx.draw_networkx_labels(G, pos, fontsize=14, font_family="Yu Gothic", font_weight="bold")

Edge thickness

Thicken the edge according to the weight of the edge.

edge_width = [ d["weight"]*0.2 for (u,v,d) in G.edges(data=True)]
nx.draw_networkx_edges(G, pos, alpha=0.4, edge_color="c", width=edge_width)

Click here to draw

Draw with this code. Try running it several times or changing the parameters until the resulting graph looks good.

%matplotlib inline
import matplotlib.pyplot as plt
import math

for (u,v,d) in G.edges(data=True):
    if d["weight"] <= 4:
        G.remove_edge(u, v)
    
plt.figure(figsize=(15,15))
pos = nx.spring_layout(G, k=0.3)

node_size = [ d['count']*20 for (n,d) in G.nodes(data=True)]
nx.draw_networkx_nodes(G, pos, node_color='w',alpha=0.6, node_size=node_size)
nx.draw_networkx_labels(G, pos, fontsize=14, font_family="Yu Gothic", font_weight="bold")

edge_width = [ d['weight']*0.2 for (u,v,d) in G.edges(data=True)]
nx.draw_networkx_edges(G, pos, alpha=0.4, edge_color='C', width=edge_width)

plt.axis('off')
plt.savefig("g2.png ")
plt.show()

graph2.png Web (Ruby (on Rails), JavaScript, PHP ...) related on the top, Python (machine learning) related on the lower left, iOS related in the middle, Windows 10 topics that Bash can be used on the lower right ... As you can see, we have created a diagram that shows popular tags and their relationships.

Recommended Posts

[Python] Draw a Qiita tag relationship diagram with NetworkX
Draw a graph with NetworkX
Draw a graph with networkx
Draw a CNN diagram in Python
You can try it with copy! Let's draw a cool network diagram with networkx of Python
Stock number ranking by Qiita tag with python
[Python] Draw a directed graph with Dash Cytoscape
Try to draw a life curve with python
[Python] Draw a Mickey Mouse with Turtle [Beginner]
[Python] How to draw a line graph with Matplotlib
I tried to draw a route map with Python
Forcibly draw something like a flowchart with Python, matplotlib
[Python] Delete by specifying a tag with Beautiful Soup
[Python] How to draw a scatter plot with Matplotlib
Draw netCDF file with python
Make a fortune with Python
Draw a heart in Python
Create a directory with python
Study math with Python: Draw a sympy (scipy) graph with matplotlib
Let's create a PRML diagram with Python, Numpy and matplotlib.
[Python] What is a with statement?
Solve ABC163 A ~ C with Python
Operate a receipt printer with python
A python graphing manual with Matplotlib.
Draw a graph with Julia + PyQtGraph (2)
Let's make a GUI with python.
Solve ABC166 A ~ D with Python
Draw a scatterplot matrix in python
Draw a loose graph with matplotlib
Quine Post with Qiita API (Python)
Create a virtual environment with Python!
I made a fortune with Python.
Draw a beautiful circle with numpy
Draw a graph with Julia + PyQtGraph (1)
Building a virtual environment with Python 3
Draw a graph with Julia + PyQtGraph (3)
Solve ABC168 A ~ C with Python
Draw a watercolor illusion with edge detection in Python3 and openCV3
Make a recommender system with python
Draw Koch curve with Python Turtle
[Python] Generate a password with Slackbot
Solve ABC162 A ~ C with Python
Get Qiita trends with Python scraping
Draw a graph with pandas + XlsxWriter
Solve ABC167 A ~ C with Python
Solve ABC158 A ~ C with Python
Let's make a graph with python! !!
Draw an illustration with Python + OpenCV
Draw Lyapunov Fractal with Python, matplotlib
Draw a graph with PySimple GUI
[Python] Inherit a class with class variables
I made a daemon with Python
Easily draw a map with matplotlib.basemap
Write a batch script with Python3.5 ~
Draw arrows (vectors) with opencv / python
I tried to draw a system configuration diagram with Diagrams on Docker
How to draw a vertical line on a heatmap drawn with Python seaborn
Create a Python3 environment with pyenv on Mac and display a NetworkX graph
[Pyenv] Building a python environment with ubuntu 16.04
Spiral book in Python! Python with a spiral book! (Chapter 14 ~)
Create a Python function decorator with Class