Created by a Pokemon researcher, We will help you analyze the results of the ** Pokemon-likeness questionnaire **.
This time, it is an article about the `` visualization of the playing tendency of 18 Pokemon games'' played by the questionnaire respondents in the ** networkx library **.
↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓ Of the Pokemon researchers who helped ** Research article about "Pokemon-ness" Part 1 ** https://pkmnheight.blogspot.com/2020/03/1.html ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑
18 target works 1.'Red / Green / Blue Red / Blue' 2.'Pikachu Yellow' 3.'Gold / Silver' 4.'Crystal Crystal' 5.'Ruby / Sapphire Ruby / Sapphire' 6.'FireRed / LeafGreen' 7.'Emerald' 8.'Diamond / Pearl' 9.'Platinum Platinum' 10.'HeartGold / SoulSilver' 11.'Black / White' 12.'Black 2 / White 2'
Originally at the request of the Pokemon researcher ** "I want you to make a Venn diagram about the play tendency of 18 works" **. (faint memory)
A ** Venn diagram ** is a visualization of the relationship between multiple sets (see the figure below).
http://www.mathlion.jp/article/ar093.html
Upon examination, it seems that humankind does not go well with Venn diagrams. I often see the Venn diagram of ** 3 sets **,
It is this Zama at the time of ** 7 set **. Humans can no longer understand anything </ font> (It seems that it can be drawn with the R language venne library)
https://stackoverflow.com/questions/32440128/nice-looking-five-sets-venn-diagrams/40048520
And in 2012, Khalegh Mamakani and Frank Ruskey of the University of Victoria discovered a ** 11 set ** Venn diagram </ font>. This seems to be the largest Venn diagram that humanity has reached at present **.
A New Rose : The First Simple Symmetric 11-Venn Diagram https://arxiv.org/abs/1207.6452
It turns out that it is impossible for me now to have 18 sets of Venn diagrams.
18 sets are not possible with Venn diagrams, but networkx, which I recently started as a hobby, may be able to visualize it. The purpose is not to make a Venn diagram
―― Which works are played a lot?
―― Which of the people who are playing a certain work is playing a lot?
I wish I could visualize Finally, I made a graph like this.
The inside of each node (circle) represents the game title of Pokemon, which is an abbreviation familiar to Pokemon fans. The prefix "01_" indicates the order in which the titles were released.
The size of the node is ** the size of the number of people who played each work **.
The thickness of the edge (the line connecting each node) is The number of people who played each of the two works is expressed by the jaccard coefficient.
jacccard coefficient
J(A,B) = \frac {\begin{vmatrix}A \cap B\end{vmatrix}} {\begin{vmatrix}A \cup B\end{vmatrix}}
The color of the node is
-<Font color = "DarkRed"> Dark Red: Completely new </ font>
-<Font color = "DarkOrange"> Orange: Remake version </ font>
-<Font color = "Gold"> Light orange: Minor change version </ font>
Represents.
** Does the person who played a certain work play any work? I think you can visualize the tendency of **.
For example, the following trends can be seen.
―― `` The number of people who played DP and BW is particularly large. ``
―― `` I don't play the minor change version, but there are people who play the completely new work. ``
――There are many people who play with the completely new work after RS and the latest sword shield. ``
―― `` There are few people who have played red, green, blue and sword shield together. ``
## Processing with python
* All code and original data are posted on my github. *
[https://github.com/mrok273/](https://github.com/mrok273/Qiita/tree/master/%E3%83%9D%E3%82%B1%E3%83%A2%E3%83%B3/%E3%83%8D%E3%83%83%E3%83%88%E3%83%AF%E3%83%BC%E3%82%AF%E5%9B%B3%E3%81%AE%E4%BD%9C%E6%88%90)
First, create a one-hot data frame as shown below.
![image.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/610426/a426c3a2-526b-b5df-a0e3-69c05db07565.png)
Create a combination of 2 works of all 18 works
```python
import itertools
combination_list = list(itertools.combinations(list(node_list), 2))
"""
Out:
[('01_Red green blue', '02_Pi'),
('01_Red green blue', '03_Gold and silver'),
('01_Red green blue', '04_Ku'),
('01_Red green blue', '05_RS'),
...
"""
for (n1,n2) in combination_list:
Union_df = df[(df[n1]==1) | (df[n2]==1)]
Intersection_df =df[(df[n1]==1) & (df[n2]==1)]
n1_list.append(n1)
n2_list.append(n2)
Union_list.append(Union_df.shape[0])
Intersection_list.append(Intersection_df.shape[0])
df_ja = pd.DataFrame({"n1":n1_list,
"n2":n2_list,
"Union":Union_list,
"Intersection":Intersection_list})
df_ja["jaccard_value"] = df_ja["Intersection"] / df_ja["Union"]
n1 | n2 | Union | Intersection | jaccard_value |
---|---|---|---|---|
01_Red green blue | 02_Pi | 13606 | 7823 | 0.574966926 |
01_Red green blue | 03_Gold and silver | 14345 | 9902 | 0.690275357 |
01_Red green blue | 04_Ku | 13697 | 6667 | 0.486748923 |
01_Red green blue | 05_RS | 17210 | 9507 | 0.552411389 |
01_Red green blue | 06_FRLG | 16331 | 7868 | 0.481783112 |
import networkx as nx
import matplotlib.pyplot as plt
import japanize_matplotlib
import numpy as np
G = nx.Graph() #Creating a graph
G.add_nodes_from(node_list) #Creating a node
#Add edge
for i in range(len(df_ja)):
row_data = df_ja.iloc[i]
G.add_edge(row_data['n1'], row_data['n2'], weight=row_data['jaccard_value'])
#Graph display
plt.figure(figsize=fig_size)
#Network diagram display settings. circular_Use layout
pos = nx.circular_layout(G)
What other layouts networkx has See NetworkX home tutorial https://networkx.github.io/documentation/stable/reference/drawing.html
Although there is a difference in the number of users who played each work, the largest value and the smallest value are at most twice as many.
With this, even if you create a graph network, it becomes difficult to realize the difference in the number of people. ``
Therefore, the difference in the number of players for each title was normalized by the maximum number, and then ** powered to make the difference extremely wide **.
--Before processing
--After processing: Relative differences for each title are widespread
Do the same for the edge thickness
player_dict={} #Stores the number of players for each title
for gen in node_list:d
player_dict[gen] = df[gen].sum()
#Setting the drawing size of the node. Emphasize the difference as it is difficult to compare with a large number
#Also,
node_size=[(v-4000)**2*10000/(max(player_dict.values())**2) for v in player_dict.values()]
def make_node_color(node_list):
#Minor change version. Processing to add different colors to the remake version
minor_change=['02_Pi','04_Ku','07_Em','09_Pt', '12_BW2','16_USUM']
remake=['06_FRLG', '10_HGSS','14_ORAS', '17_Pikabui']
color_list=[]
for node in node_list:
if node in minor_change:
color = 3000
elif node in remake:
color = 6000
else:
color = 10000
color_list.append(color)
return color_list
node_color = make_node_color(node_list)
#Draw a node on the graph. Set the node color and size here
nx.draw_networkx_nodes(G, pos, node_color=node_color,
cmap=plt.cm.Reds,
alpha=0.7,
node_size=node_size)
#Japanese label
nx.draw_networkx_labels(G, pos, fontsize=20, font_family='IPAexGothic', font_weight="bold")
#Edge thickness adjustment.This also emphasizes the difference
edge_width = [np.exp(d["weight"]-0.3)**5 for (u, v, d) in G.edges(data=True)]
#Edge drawing
nx.draw_networkx_edges(G, pos, alpha=0.4, edge_color=edge_width, width=edge_width,edge_cmap=plt.cm.Blues)
plt.axis('off')
** This makes the relationship between each title more emphasized and easier to see. ** **
Since networkx is used by fewer people than the pandas library, it may be difficult to find an article that suits your purpose. However, expressions that cannot be expressed by matplotlib or seaborn are possible. However, trial and error is required for easy-to-understand visualization, and it is necessary to organize the colors and sizes. I often don't know much about networkx.