[PYTHON] Visualization of Pokemon game title play trends with networkx

Introduction

Created by a Pokemon researcher, We will help you analyze the results of the ** Pokemon-likeness questionnaire **.

This time, it is an article about the `` visualization of the playing tendency of 18 Pokemon games'' played by the questionnaire respondents in the ** networkx library **.

↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓ Of the Pokemon researchers who helped ** Research article about "Pokemon-ness" Part 1 ** https://pkmnheight.blogspot.com/2020/03/1.html ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑

18 target works 1.'Red / Green / Blue Red / Blue' 2.'Pikachu Yellow' 3.'Gold / Silver' 4.'Crystal Crystal' 5.'Ruby / Sapphire Ruby / Sapphire' 6.'FireRed / LeafGreen' 7.'Emerald' 8.'Diamond / Pearl' 9.'Platinum Platinum' 10.'HeartGold / SoulSilver' 11.'Black / White' 12.'Black 2 / White 2'

  1. 'X/Y' 14.'Omega Ruby / Alpha Sapphire' 15.'Sun / Moon' 16.'Ultra Sun / Ultra Moon'
  2. "Let's Go! Let's Go! Let's Go, Pikachu! / Let's Go, Eevee!" 18.'Sword / Shield'

18 Is the collective Venn diagram impossible for humankind today? ?? ??

Originally at the request of the Pokemon researcher ** "I want you to make a Venn diagram about the play tendency of 18 works" **. (faint memory)

A ** Venn diagram ** is a visualization of the relationship between multiple sets (see the figure below).

http://www.mathlion.jp/article/ar093.html

Upon examination, it seems that humankind does not go well with Venn diagrams. I often see the Venn diagram of ** 3 sets **,

unnamed.png

It is this Zama at the time of ** 7 set **. Humans can no longer understand anything </ font> (It seems that it can be drawn with the R language venne library)

image.png https://stackoverflow.com/questions/32440128/nice-looking-five-sets-venn-diagrams/40048520

And in 2012, Khalegh Mamakani and Frank Ruskey of the University of Victoria discovered a ** 11 set ** Venn diagram </ font>. This seems to be the largest Venn diagram that humanity has reached at present **.

A New Rose : The First Simple Symmetric 11-Venn Diagram image.png image.png https://arxiv.org/abs/1207.6452

It turns out that it is impossible for me now to have 18 sets of Venn diagrams.

Visualization with networkx

18 sets are not possible with Venn diagrams, but networkx, which I recently started as a hobby, may be able to visualize it. The purpose is not to make a Venn diagram

―― Which works are played a lot? ―― Which of the people who are playing a certain work is playing a lot?

I wish I could visualize Finally, I made a graph like this. out.png

The inside of each node (circle) represents the game title of Pokemon, which is an abbreviation familiar to Pokemon fans. The prefix "01_" indicates the order in which the titles were released.

The size of the node is ** the size of the number of people who played each work **. The thickness of the edge (the line connecting each node) is The number of people who played each of the two works is expressed by the jaccard coefficient.

jacccard coefficient

J(A,B) = \frac {\begin{vmatrix}A \cap B\end{vmatrix}} {\begin{vmatrix}A \cup B\end{vmatrix}}


 The color of the node is
 
 -<Font color = "DarkRed"> Dark Red: Completely new </ font>
 -<Font color = "DarkOrange"> Orange: Remake version </ font>
 -<Font color = "Gold"> Light orange: Minor change version </ font>

 Represents.

 ** Does the person who played a certain work play any work? I think you can visualize the tendency of **.
 For example, the following trends can be seen.

 ―― `` The number of people who played DP and BW is particularly large. ``
 ―― `` I don't play the minor change version, but there are people who play the completely new work. ``
 ――There are many people who play with the completely new work after RS and the latest sword shield. ``
 ―― `` There are few people who have played red, green, blue and sword shield together. ``


## Processing with python

 * All code and original data are posted on my github. *
[https://github.com/mrok273/](https://github.com/mrok273/Qiita/tree/master/%E3%83%9D%E3%82%B1%E3%83%A2%E3%83%B3/%E3%83%8D%E3%83%83%E3%83%88%E3%83%AF%E3%83%BC%E3%82%AF%E5%9B%B3%E3%81%AE%E4%BD%9C%E6%88%90)


 First, create a one-hot data frame as shown below.
 ![image.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/610426/a426c3a2-526b-b5df-a0e3-69c05db07565.png)


 Create a combination of 2 works of all 18 works

```python

import itertools
combination_list = list(itertools.combinations(list(node_list), 2))

"""
Out:
[('01_Red green blue', '02_Pi'),
 ('01_Red green blue', '03_Gold and silver'),
 ('01_Red green blue', '04_Ku'),
 ('01_Red green blue', '05_RS'),
...
"""

Calculate the jaccard coefficient between each two titles.

for (n1,n2) in combination_list:
    Union_df = df[(df[n1]==1) | (df[n2]==1)]
    Intersection_df =df[(df[n1]==1) & (df[n2]==1)]
    n1_list.append(n1)
    n2_list.append(n2)
    Union_list.append(Union_df.shape[0])
    Intersection_list.append(Intersection_df.shape[0])

df_ja = pd.DataFrame({"n1":n1_list,
             "n2":n2_list,
             "Union":Union_list,
             "Intersection":Intersection_list})
df_ja["jaccard_value"] = df_ja["Intersection"] / df_ja["Union"]
n1 n2 Union Intersection jaccard_value
01_Red green blue 02_Pi 13606 7823 0.574966926
01_Red green blue 03_Gold and silver 14345 9902 0.690275357
01_Red green blue 04_Ku 13697 6667 0.486748923
01_Red green blue 05_RS 17210 9507 0.552411389
01_Red green blue 06_FRLG 16331 7868 0.481783112

Graph settings with networkx

import networkx as nx
import matplotlib.pyplot as plt
import japanize_matplotlib
import numpy as np

G = nx.Graph() #Creating a graph
G.add_nodes_from(node_list) #Creating a node

#Add edge
for i in range(len(df_ja)):
    row_data = df_ja.iloc[i]
    G.add_edge(row_data['n1'], row_data['n2'], weight=row_data['jaccard_value'])

#Graph display
plt.figure(figsize=fig_size) 

#Network diagram display settings. circular_Use layout
pos = nx.circular_layout(G) 

What other layouts networkx has See NetworkX home tutorial https://networkx.github.io/documentation/stable/reference/drawing.html

Make the graph look better

Although there is a difference in the number of users who played each work, the largest value and the smallest value are at most twice as many.

With this, even if you create a graph network, it becomes difficult to realize the difference in the number of people. `` image.png

Therefore, the difference in the number of players for each title was normalized by the maximum number, and then ** powered to make the difference extremely wide **.

--Before processing image.png

--After processing: Relative differences for each title are widespread image.png

Do the same for the edge thickness

player_dict={} #Stores the number of players for each title
for gen in node_list:d
    player_dict[gen] = df[gen].sum()

#Setting the drawing size of the node. Emphasize the difference as it is difficult to compare with a large number
#Also,
node_size=[(v-4000)**2*10000/(max(player_dict.values())**2) for v in player_dict.values()]

def make_node_color(node_list):
    #Minor change version. Processing to add different colors to the remake version
    minor_change=['02_Pi','04_Ku','07_Em','09_Pt', '12_BW2','16_USUM']
    remake=['06_FRLG', '10_HGSS','14_ORAS', '17_Pikabui']
    color_list=[]
    for node in node_list:
        if node in minor_change:
            color = 3000
        elif node in remake:
            color = 6000
        else:
            color = 10000
        color_list.append(color)
    return color_list

node_color = make_node_color(node_list)

#Draw a node on the graph. Set the node color and size here
nx.draw_networkx_nodes(G, pos, node_color=node_color,
                       cmap=plt.cm.Reds,
                       alpha=0.7,
                       node_size=node_size)

#Japanese label
nx.draw_networkx_labels(G, pos, fontsize=20, font_family='IPAexGothic', font_weight="bold")

#Edge thickness adjustment.This also emphasizes the difference
edge_width = [np.exp(d["weight"]-0.3)**5 for (u, v, d) in G.edges(data=True)]

#Edge drawing
nx.draw_networkx_edges(G, pos, alpha=0.4, edge_color=edge_width, width=edge_width,edge_cmap=plt.cm.Blues)

plt.axis('off')

** This makes the relationship between each title more emphasized and easier to see. ** ** image.png

Finally

Since networkx is used by fewer people than the pandas library, it may be difficult to find an article that suits your purpose. However, expressions that cannot be expressed by matplotlib or seaborn are possible. However, trial and error is required for easy-to-understand visualization, and it is necessary to organize the colors and sizes. I often don't know much about networkx.

Recommended Posts