[PYTHON] Pokemon x Data Science (3) --Thinking about Pokemon Sword Shield Party Construction from Network Analysis Where is the center of the network?

-Pokemon x Data Science (1) --I analyzed the rank battle data of Pokemon Sword Shield and visualized it on Tableau -Pokemon x Data Science (2) --Trial version of thinking about party construction of Pokemon sword shield from network analysis -[This time] Pokemon x Data Science (3) --Thinking about party construction of Pokemon sword shield from network analysis Where is the center of the network

Hello, we will deal continue to graph theory, a network analysis in the previous article](https://qiita.com/b_aka/items/9020e3237ff1a3e676e4) this time.

In the previous article, we visualized the network of Pokemon Sword Shield Rank Battle parties. This time, I will actually start the analysis.

The code and data used this time can be found in This Github repository.

The full code is here [https://github.com/moxak/pokemon-rankbattle-network-analysis/blob/master/002.ipynb)

What I want to achieve this time

As the title suggests, I would like to cluster each node of the Pokemon Sword Shield Party Building Network. We also want to capture important nodes by introducing the concept of centrality before clustering.

If you can do something like this, you have achieved your goal.

community.gml.png

Center in the network

There is a centrality in network theory (graph theory).

Centrality is an indicator for assessing and comparing the importance of each vertex in the network.

[Network Analysis 2nd Edition Learning with R Data Science](https://www.amazon.co.jp/%E3%83%8D%E3%83%83%E3%83%88%E3%83%AF % E3% 83% BC% E3% 82% AF% E5% 88% 86% E6% 9E% 90-% E7% AC% AC2% E7% 89% 88-R% E3% 81% A7% E5% AD% A6% E3% 81% B6% E3% 83% 87% E3% 83% BC% E3% 82% BF% E3% 82% B5% E3% 82% A4% E3% 82% A8% E3% 83% B3% E3% 82% B9-% E9% 88% B4% E6% 9C% A8-% E5% 8A% AA / dp / 4320113152 / ref = sr_1_4? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3 % 82% AB% E3% 83% 8A & dchild = 1 & keywords =% E3% 83% 8D% E3% 83% 83% E3% 83% 88% E3% 83% AF% E3% 83% BC% E3% 82% AF% From E5% 88% 86% E6% 9E% 90 & sr = 8-4)

This is an attempt to mathematically derive how central = ** important ** each node is in the network.

This time, I would like to use this theory to calculate the importance of each node in the network.

However, there are actually many types of this centrality.

1. Degree centrality

The first centrality is degree centrality.

Degree is the number of edges that a node has. The number of orders becomes central as it is.

The more nodes you have connected, the more important it is! While it is a basic centrality based on the idea, I feel that the results often deviate from intuition.

2. Proximity centrality, eccentricity centrality

Next are proximity centrality and eccentricity, which derive centrality from the distance from other nodes.

The closer you are to the center of the network, the more important it is! That's the idea.

Proximity centrality takes the reciprocal of the total distance from its own node to other nodes, and eccentricity takes the reciprocal of the maximum distance from its own node to other nodes.

Since the results of both using distance are very similar, I would like to use proximity centrality this time.

3. Mediation centrality

The most commonly used (I think) is mediated centrality.

Simply put, the idea is that the more frequently located on the shortest path, the more important it is (relayability).

A node with a high mediation center in a community network means that it is in a position where it is not possible to access another community without going through that node, which seems to be intuitively important. I hope you can understand that.

4. Eigenvector centrality

The eigenvector centrality is very different from the four centralities introduced so far, and it is a centrality that introduces the idea of "which node is connected to".

Incorporating the idea that "nodes connected to important nodes are more important", we repeat the process of adding the centrality of others connected to ourselves, and make the converged value the centrality.

5. Page rank

It's the centrality devised by Google founders Larry Page and Sergey Brin.

The basic idea is the same as eigenvector centrality. To understand what is different, we need to know the problem of eigenvector centrality.

Suppose you have a node in your network that has no edges from other nodes. The centrality of this node is of course 0. That's fine so far, but the next one is a bit tricky. Suppose you have a node i that is connected only to such a node. Naturally, the centrality of i does not change from the connected nodes, so the centrality of i is also 0. I think this is counterintuitive.

Also, let's say you have a node i connected to a node j that has a tremendous amount of centrality. In the idea of eigenvector centrality, the centrality of node j transitions to node i, but node i is just one of many nodes to which node j attaches an edge. Should we transition all of the centrality of node j to node i?

PageRank is a centrality index that solves these problems to some extent.

It became the basis of Google's search algorithm and is still used to calculate the impact factor of papers.

Derivation of centrality

The data used to derive the centrality is the "Pokemon ranking to be adopted together" data and the adoption ranking data of the rank battle of Pokemon sword shield from the previous time. We will analyze the network consisting of the top 100 animals in the recruitment ranking.

df = pd.read_csv(FILEPATH_TEMOTI_POKEMON), encoding='utf-8')
df_rank = pd.read_csv(FILEPATH_ADO_RANK, encoding='utf-8')

df.columns = ['Season', 'Rule', 'Pokemon_From', 'Pokemon_To', 'Weight']
df['Weight'] = 10-df['Weight']

df_season11_double = df[(df['Season']==11)&(df['Rule']=='Double')]
df_season11_double = df_season11_double.drop(['Season', 'Rule'], axis=1)

#Limited to the top 100 recruitment rates
df_season11_double = df_season11_double[df_season11_double['Pokemon_From'].isin(list(df_rank['Pokemon'])[:100])]
df_season11_double = df_season11_double[df_season11_double['Pokemon_To'].isin(list(df_rank['Pokemon'])[:100])]

df_season11_double.to_csv(OUTPUT_FILEPATH', index=False)
df_season11_double
index Pokemon_From Pokemon_To Weight
91394 Charizard Ninetales 9
91395 Charizard Tritodon 8
91396 Charizard Pippi 7
91397 Charizard Terrakion 6
91398 Charizard Sableye 5

504 rows × 3 columns

Create a network from the data created above.

import networkx as nx
network_np = df_season11_double.values
G = nx.DiGraph()
G.add_weighted_edges_from(network_np)

default_network.png

1. Degree centrality

degree_centers = nx.degree_centrality(G)
df_dc = pd.DataFrame(sorted(degree_centers.items(), key=lambda x: x[1], reverse=True), columns=['Pokemon', 'Degree centrality'])
df_dc.head(10)
index Pokemon Degree centrality
0 Ulaos 0.500000
1 Talonflame 0.490291
2 Achilleine 0.451456
3 Amoonguss 0.419903
4 Windy 0.359223
5 Dusclops 0.308252
6 Oronge 0.293689
7 Laplace 0.291262
8 Charizard 0.269417
9 Ferrothorn 0.237864

degree_centrality.png

2. Proximity centrality

close_centers = nx.closeness_centrality(G)
df_cc = pd.DataFrame(sorted(close_centers.items(), key=lambda x: x[1], reverse=True), columns=['Pokemon', 'Closeness centrality'])
df_cc.head(10)
index Pokemon Closeness centrality
0 Ulaos 0.648440
1 Talonflame 0.646345
2 Achilleine 0.628081
3 Amoonguss 0.612691
4 Windy 0.594483
5 Dusclops 0.572371
6 Laplace 0.562711
7 Oronge 0.551085
8 Pippi 0.545078
9 Ferrothorn 0.545078

closeness_centrality.png

3. Mediation centrality

between_centers = nx.betweenness_centrality(G)
df_bc = pd.DataFrame(sorted(between_centers.items(), key=lambda x: x[1], reverse=True), columns=['Pokemon', 'Betweenness centrality'])
df_bc.head(10)
index Pokemon Betweenness centrality
0 Ninetales 0.046012
1 Persian 0.036945
2 Tritodon 0.028901
3 Charizard 0.025690
4 Terrakion 0.021207
5 Glaceon 0.019807
6 Sandslash 0.018529
7 Achilleine 0.013435
8 Nyai King 0.011381
9 Heliolisk 0.009867

betweenness_centrality.png

4. Eigenvector centrality

eigen_centers = nx.eigenvector_centrality_numpy(G)
df_ec = pd.DataFrame(sorted(eigen_centers.items(), key=lambda x: x[1], reverse=True), columns=['Pokemon', 'Eigen centrality'])
df_ec.head(10)
index Pokemon Eigen centrality
0 Ulaos 0.374252
1 Achilleine 0.362932
2 Talonflame 0.341690
3 Amoonguss 0.332555
4 Dusclops 0.297832
5 Windy 0.292263
6 Patch Ragon 0.261000
7 Ferrothorn 0.259066
8 Pippi 0.237895
9 Laplace 0.218322

eigen_centrality.png

5. Page rank

pageranks = nx.pagerank(G)
df_pr = pd.DataFrame(sorted(pageranks.items(), key=operator.itemgetter(1),reverse = True), columns=['Pokemon', 'Page Rank'])
df_pr.head(10)

pagerank.png

I tried to increase the label font of the node with large centrality.

I arranged the published rankings and each centrality index.

スクリーンショット2020-11-07224557.png

In the ranking narrowed down to the top 100 animals and the top 10 animals in the combined rank, it can be seen that Ulaos is higher than the significantly published rank in any index. (The feeling of being overrated)

From now on, we will use PageRank.

Network structure clustering

It's finally over. I would like to enter into the clustering of network structures, which is the subject of this time.

There are various clustering (community extraction) methods, such as those using the mediation centrality and eigenvector centrality derived earlier, information centrality, spin glass method, and random walk, which are not introduced this time.

This time, I have devoted a considerable amount of sentences to the derivation of centrality, so I would like to leave the execution of clustering by each centrality and the comparison of the results to another opportunity.

Hurry up this time. Clustering will be done using the method here (Paper, Implementation Library).

This method is an indicator of network density ([Modularity](https://en.wikipedia.org/wiki/%E3%83%A2%E3%82%B8%E3%83%A5%E3%83%] A9% E3% 83% AA% E3% 83% 86% E3% 82% A3))) is a method of dividing to the maximum, and it differs from k-means in that it is not necessary to specify the number of clusters in advance. there is.

Directed graphs cannot be used in this implementation, so we will convert them to undirected graphs.

#Convert directed graph to undirected graph
G2 = nx.Graph(G)

import community
partition = community.best_partition(G2)
partition2 = {}
for i in partition.keys():
    sub_dict = {'community' : partition[i]}
    partition2[i] = sub_dict

labels = dict([(i, str(i)) for i in range(nx.number_of_nodes(G2))])
labels2 = {}
for i in range(len(labels)):
    sub_dict = {'labels' : labels[i]}
    labels2[list(partition.keys())[i]] = sub_dict

nx.set_node_attributes(G2, labels2)
nx.set_node_attributes(G2, partition2)
nx.write_gml(G2, ".//community.gml")

pd.DataFrame.from_dict(labels2).T.to_csv('.//community_labels.csv')

Interpretation of cluster

As a result of clustering, 6 clusters were extracted. Let's take a look at each cluster.

df_pagerank = pd.DataFrame(sorted(pageranks.items(), key=operator.itemgetter(1),reverse = True), columns=['Pokemon', 'Page Rank'])
df_community = pd.concat([pd.DataFrame.from_dict(labels2).T, pd.DataFrame.from_dict(partition2).T], axis=1)
df_community = df_community.reset_index()
df_community.columns = ['Pokemon', 'label', 'community']

df_pagerank_community = pd.merge(left=df_pagerank, right=df_community, on = 'Pokemon')

Check the figure below of the Pokemon classified in each cluster.

df_pagerank_community.groupby('community').count()['Pokemon'].plot.bar(rot=0, alpha=0.75)

plt.png

You can see that cluster 3 accounts for nearly 30% of the total.

Let's take a look at the contents of each cluster.

Cluster 0

df_pagerank_community[df_pagerank_community['community']==0].head(10)
index Pokemon Page Rank label community
13 Charizard 0.013742 0 0
18 Tritodon 0.007309 2 0
19 Sekitanzan 0.006604 32 0
20 Ninetales 0.006319 1 0
30 Sableye 0.002842 5 0
42 Weavile 0.001501 68 0
46 Sneasel 0.001235 55 0
53 Cobalion 0.001133 56 0
61 Leafeon 0.000935 30 0
69 Virizion 0.000789 69 0

Is it a sunny day centered on Lizardon Ninetales? Charizard is ranked first in the centrality, and Tritodon, which has excellent compatibility with Charizard, is ranked second.

Cluster 1

df_pagerank_community[df_pagerank_community['community']==1].head(10)
index Pokemon Page Rank label community
8 Pippi 0.033903 3 1
16 Polygon-Z 0.012646 27 1
17 Terrakion 0.009129 4 1
41 Persian 0.001544 33 1
80 Luxray 0.000630 64 1
81 Ennute 0.000629 92 1

Next, cluster 1 was these 6 animals. Polygon-Z, which can produce super-heat power, has been classified by pyroxene pippi, which has extremely high support performance, and adaptability dimax. The impression is that there are many versatile Pokemon that can be used at any party.

Cluster 2

df_pagerank_community[df_pagerank_community['community']==2].head(10)
index Pokemon Page Rank label community
7 Ferrothorn 0.038186 6 2
24 Sylveon 0.005511 20 2
28 Pelipper 0.003061 57 2
29 Wonoragon 0.002859 46 2
32 Kingdra 0.002468 49 2
33 Politoed 0.002321 47 2
35 Ludicolo 0.002233 50 2
37 Seismitoad 0.001736 58 2
38 Escavalier 0.001626 59 2
39 Weezing 0.001572 44 2

Cluster 2 is easy to understand, it is a rain pa that includes pelipper, kingdra, ludicolo and so on. The centrality of the ferrothorn, which is excellent in complementing the compatibility with the water type, is high. In addition, it is intuition that Pokemon such as Escavalier, which is not good at flame type, is composed of rain pa.

Cluster 3

df_pagerank_community[df_pagerank_community['community']==3].head(10)
index Pokemon Page Rank label community
0 Achilleine 0.107503 7 3
1 Talonflame 0.093996 12 3
4 Windy 0.056541 15 3
5 Patch Ragon 0.049160 16 3
14 Duraldon 0.013730 24 3
15 Oronge 0.013617 29 3
21 Amarjo 0.006167 17 3
26 Braviary 0.003980 23 3
27 Gengar 0.003799 43 3
34 Pixie 0.002254 28 3

Cluster 3 seems to be concentrated in the top meta of the environment. If you look at the party construction considerations that are rolling on the net, you'll often see combinations of Achilleine, Talonflame, and Patchragon as successful constructions, so the Pokemon who had a great influence in the Season 11 double environment I guess there are many.

Cluster 4

df_pagerank_community[df_pagerank_community['community']==4].head(10)
index Pokemon Page Rank label community
2 Amoonguss 0.083762 8 4
3 Dusclops 0.061276 9 4
9 Brim on 0.026956 25 4
11 Rhyperior 0.016228 21 4
12 Dadarin 0.014804 26 4
22 Raichu 0.006124 13 4
25 Garula 0.005487 19 4
31 rattle 0.002758 39 4
40 Slowbro 0.001563 40 4
45 Stringer 0.001322 82 4

This cluster is also very easy to understand. Dusclops and Brimon, who act as trill starters, Rhyperior, Dadarin, and Marowak, who act as trill attackers, and Amoonguss, who act as supporters, are classified.

Looking at the centrality, we can see that the Amoonguss play a very important role.

Cluster 5

df_pagerank_community[df_pagerank_community['community']==5].head(10)
index Pokemon Page Rank label community
6 Ulaos 0.045935 10 5
10 Laplace 0.024565 22 5
23 Kuwawa 0.005706 14 5
36 Goodra 0.002140 88 5
50 Noivern 0.001201 83 5
55 Blastoise 0.001056 11 5
63 Mahip 0.000880 94 5
90 Togedemaru 0.000463 93 5

The last cluster is these 8 animals. What kind of gathering are these Pokemon? It was difficult to interpret with my knowledge set, so I would love to hear from you in the comments.

Visualization

Finally, visualize the network with Cytoscape, which was confirmed how to use it last time.

Load community.gml from ** File> Import> Network from File ** and ** Import Table from File ** at the top (see figure below)

スクリーンショット2020-11-05021410.png

From, load community_labels.csv and set the displayed dialog box as follows.

Note that the red part needs to be changed from the default.

スクリーンショット2020-11-07203853.png

After that, just change the shape and color for each cluster by making full use of ** Continuous Mapping ** from the ** Style tab **.

スクリーンショット2020-11-07204346.png

I tried to visualize the network with the font color as the cluster and the font size as the page rank.

community.gml.png

Next time, I would like to search for the best clustering method for this data.

See you again.

Source, etc.

-Pokemon x Data Science (1) --I analyzed the rank battle data of Pokemon Sword Shield and visualized it on Tableau -Pokemon x Data Science (2) --Trial version of thinking about party construction of Pokemon sword shield from network analysis

© 2020 Pokémon © 1995-2020 Nintendo / Creatures Inc./GAME FREAK inc. Pokemon, Pokemon, and Pokémon are registered trademarks of Nintendo, Creatures, and Game Freak.

Recommended Posts

Pokemon x Data Science (3) --Thinking about Pokemon Sword Shield Party Construction from Network Analysis Where is the center of the network?
Pokemon x Data Science (2) --Trial version of thinking about party construction of Pokemon sword shield from network analysis
Data Science Virtual Machines is the best environment for data analysis from now on!
Visualize the center of the rank battle environment from the Pokemon Home API