I will explain how to use the Python library NetworkX using the creation of a relationship graph of tags attached to Qiita posts as an example. With NetworkX, you can draw a graph consisting of nodes and edges, as shown below.
Qiita publishes an API to get posts, so you can easily get posts. Convert the data returned in JSON format to a Python dictionary with the following code. In the case of non-authentication, there is a limit of 100 articles per request and 60 times per hour. This time, we will target 100 * 60 = 6000 articles.
import requests
import json
items = []
params = {"page":1, "per_page":100}
for i in range(60):
print("fetching... page " + str(i+1))
params["page"] = i + 1
res = requests.get("https://qiita.com/api/v2/items", params=params)
items.extend(json.loads(res.text))
For the data acquired by API, only the tags are extracted and converted to the format of [[tag1, tag2], [tag3], ...]
.
tags_list = []
for item in items:
tags = [tag["name"] for tag in item["tags"]]
tags_list.append(tags)
Also, use collections.Counter
to count the number of occurrences of tags.
At this time, the multiple array is flattened with ʻitertools.chain.from_iterable (tags_list)`.
If there are too many nodes, the figure will be messed up, so extract the top 50 tags.
import collections
import itertools
tag_count = collections.Counter(itertools.chain.from_iterable(tags_list)).most_common(50)
From here, we will use NetworkX to create a graph.
Create a new graph with G = nx.Graph ()
and add nodes with tag names.
In order to determine the size of the node when drawing later, the number of occurrences count
is put in the attribute of the node.
import networkx as nx
G = nx.Graph()
G.add_nodes_from([(tag, {"count":count}) for tag,count in tag_count])
If a post contains multiple tags, add edges to all combinations
For example, if you have a post tagged with "Python, networkx, Qiita" like this article,
Create edges between Python nodes and networkx nodes, between networkx and Qiita, and between Qiita and Python.
If the edge already exists, increase the weight
of the edge.
for tags in tags_list:
for node0,node1 in itertools.combinations(tags, 2):
if not G.has_node(node0) or not G.has_node(node1):
continue
if G.has_edge(node0, node1):
G.edge[node0][node1]["weight"] += 1
else:
G.add_edge(node0, node1, {"weight":1})
Let's draw a graph here.
%matplotlib inline
import matplotlib.pyplot as plt
plt.figure(figsize=(15,15))
pos = nx.spring_layout(G)
nx.draw_networkx(G,pos)
plt.axis("off")
plt.savefig("default.png ")
plt.show()
I have a graph that I don't understand.
From here, we will make various adjustments to make the graph cleaner.
Delete edges that appear less frequently.
for (u,v,d) in G.edges(data=True):
if d["weight"] <= 4:
G.remove_edge(u, v)
In pos = nx.spring_layout (G)
, the position of the node is determined by the repulsive force between the nodes and the attractive force due to the size of the weight
of the edge.
The repulsive force between nodes can be set by specifying the argument k
, and the larger the k
, the closer the node arrangement is to a circle.
pos = nx.spring_layout(G, k=0.3)
The larger the count
, the larger the size of the node circle.
Also, in the figure output earlier, Japanese is not displayed as a rectangle, so set a font that can display Japanese.
node_size = [ d["count"]*20 for (n,d) in G.nodes(data=True)]
nx.draw_networkx_nodes(G, pos, node_color="w",alpha=0.6, node_size=node_size)
nx.draw_networkx_labels(G, pos, fontsize=14, font_family="Yu Gothic", font_weight="bold")
Thicken the edge according to the weight
of the edge.
edge_width = [ d["weight"]*0.2 for (u,v,d) in G.edges(data=True)]
nx.draw_networkx_edges(G, pos, alpha=0.4, edge_color="c", width=edge_width)
Draw with this code. Try running it several times or changing the parameters until the resulting graph looks good.
%matplotlib inline
import matplotlib.pyplot as plt
import math
for (u,v,d) in G.edges(data=True):
if d["weight"] <= 4:
G.remove_edge(u, v)
plt.figure(figsize=(15,15))
pos = nx.spring_layout(G, k=0.3)
node_size = [ d['count']*20 for (n,d) in G.nodes(data=True)]
nx.draw_networkx_nodes(G, pos, node_color='w',alpha=0.6, node_size=node_size)
nx.draw_networkx_labels(G, pos, fontsize=14, font_family="Yu Gothic", font_weight="bold")
edge_width = [ d['weight']*0.2 for (u,v,d) in G.edges(data=True)]
nx.draw_networkx_edges(G, pos, alpha=0.4, edge_color='C', width=edge_width)
plt.axis('off')
plt.savefig("g2.png ")
plt.show()
Web (Ruby (on Rails), JavaScript, PHP ...) related on the top, Python (machine learning) related on the lower left, iOS related in the middle, Windows 10 topics that Bash can be used on the lower right ... As you can see, we have created a diagram that shows popular tags and their relationships.
Recommended Posts