-Pokemon x Data Science (1) --I analyzed the rank battle data of Pokemon Sword Shield and visualized it on Tableau -Pokemon x Data Science (2) --Trial version of thinking about party construction of Pokemon sword shield from network analysis -[This time] Pokemon x Data Science (3) --Thinking about party construction of Pokemon sword shield from network analysis Where is the center of the network

Hello, we will deal continue to graph theory, a network analysis in the previous article](https://qiita.com/b_aka/items/9020e3237ff1a3e676e4) this time.

In the previous article, we visualized the network of Pokemon Sword Shield Rank Battle parties. This time, I will actually start the analysis.

The code and data used this time can be found in This Github repository.

The full code is here [https://github.com/moxak/pokemon-rankbattle-network-analysis/blob/master/002.ipynb)

What I want to achieve this time

As the title suggests, I would like to cluster each node of the Pokemon Sword Shield Party Building Network. We also want to capture important nodes by introducing the concept of centrality before clustering.

If you can do something like this, you have achieved your goal.

community.gml.png

Center in the network

There is a centrality in network theory (graph theory).

Centrality is an indicator for assessing and comparing the importance of each vertex in the network.

[Network Analysis 2nd Edition Learning with R Data Science](https://www.amazon.co.jp/%E3%83%8D%E3%83%83%E3%83%88%E3%83%AF % E3% 83% BC% E3% 82% AF% E5% 88% 86% E6% 9E% 90-% E7% AC% AC2% E7% 89% 88-R% E3% 81% A7% E5% AD% A6% E3% 81% B6% E3% 83% 87% E3% 83% BC% E3% 82% BF% E3% 82% B5% E3% 82% A4% E3% 82% A8% E3% 83% B3% E3% 82% B9-% E9% 88% B4% E6% 9C% A8-% E5% 8A% AA / dp / 4320113152 / ref = sr_1_4? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3 % 82% AB% E3% 83% 8A & dchild = 1 & keywords =% E3% 83% 8D% E3% 83% 83% E3% 83% 88% E3% 83% AF% E3% 83% BC% E3% 82% AF% From E5% 88% 86% E6% 9E% 90 & sr = 8-4)

This is an attempt to mathematically derive how central = ** important ** each node is in the network.

This time, I would like to use this theory to calculate the importance of each node in the network.

However, there are actually many types of this centrality.

1. Degree centrality

The first centrality is degree centrality.

Degree is the number of edges that a node has. The number of orders becomes central as it is.

The more nodes you have connected, the more important it is! While it is a basic centrality based on the idea, I feel that the results often deviate from intuition.

2. Proximity centrality, eccentricity centrality

Next are proximity centrality and eccentricity, which derive centrality from the distance from other nodes.

The closer you are to the center of the network, the more important it is! That's the idea.

Proximity centrality takes the reciprocal of the total distance from its own node to other nodes, and eccentricity takes the reciprocal of the maximum distance from its own node to other nodes.

Since the results of both using distance are very similar, I would like to use proximity centrality this time.

3. Mediation centrality

The most commonly used (I think) is mediated centrality.

Simply put, the idea is that the more frequently located on the shortest path, the more important it is (relayability).

A node with a high mediation center in a community network means that it is in a position where it is not possible to access another community without going through that node, which seems to be intuitively important. I hope you can understand that.

4. Eigenvector centrality

The eigenvector centrality is very different from the four centralities introduced so far, and it is a centrality that introduces the idea of "which node is connected to".

Incorporating the idea that "nodes connected to important nodes are more important", we repeat the process of adding the centrality of others connected to ourselves, and make the converged value the centrality.

5. Page rank

It's the centrality devised by Google founders Larry Page and Sergey Brin.

The basic idea is the same as eigenvector centrality. To understand what is different, we need to know the problem of eigenvector centrality.

Suppose you have a node in your network that has no edges from other nodes. The centrality of this node is of course 0. That's fine so far, but the next one is a bit tricky. Suppose you have a node i that is connected only to such a node. Naturally, the centrality of i does not change from the connected nodes, so the centrality of i is also 0. I think this is counterintuitive.

Also, let's say you have a node i connected to a node j that has a tremendous amount of centrality. In the idea of eigenvector centrality, the centrality of node j transitions to node i, but node i is just one of many nodes to which node j attaches an edge. Should we transition all of the centrality of node j to node i?

PageRank is a centrality index that solves these problems to some extent.

It became the basis of Google's search algorithm and is still used to calculate the impact factor of papers.

Derivation of centrality

The data used to derive the centrality is the "Pokemon ranking to be adopted together" data and the adoption ranking data of the rank battle of Pokemon sword shield from the previous time. We will analyze the network consisting of the top 100 animals in the recruitment ranking.

df = pd.read_csv(FILEPATH_TEMOTI_POKEMON), encoding='utf-8')
df_rank = pd.read_csv(FILEPATH_ADO_RANK, encoding='utf-8')

df.columns = ['Season', 'Rule', 'Pokemon_From', 'Pokemon_To', 'Weight']
df['Weight'] = 10-df['Weight']

df_season11_double = df[(df['Season']==11)&(df['Rule']=='Double')]
df_season11_double = df_season11_double.drop(['Season', 'Rule'], axis=1)

#Limited to the top 100 recruitment rates
df_season11_double = df_season11_double[df_season11_double['Pokemon_From'].isin(list(df_rank['Pokemon'])[:100])]
df_season11_double = df_season11_double[df_season11_double['Pokemon_To'].isin(list(df_rank['Pokemon'])[:100])]

df_season11_double.to_csv(OUTPUT_FILEPATH', index=False)
df_season11_double

index	Pokemon_From	Pokemon_To	Weight
91394	Charizard	Ninetales	9
91395	Charizard	Tritodon	8
91396	Charizard	Pippi	7
91397	Charizard	Terrakion	6
91398	Charizard	Sableye	5

504 rows × 3 columns

Create a network from the data created above.

import networkx as nx
network_np = df_season11_double.values
G = nx.DiGraph()
G.add_weighted_edges_from(network_np)

1. Degree centrality

degree_centers = nx.degree_centrality(G)
df_dc = pd.DataFrame(sorted(degree_centers.items(), key=lambda x: x[1], reverse=True), columns=['Pokemon', 'Degree centrality'])
df_dc.head(10)

index	Pokemon	Degree centrality
0	Ulaos	0.500000
1	Talonflame	0.490291
2	Achilleine	0.451456
3	Amoonguss	0.419903
4	Windy	0.359223
5	Dusclops	0.308252
6	Oronge	0.293689
7	Laplace	0.291262
8	Charizard	0.269417
9	Ferrothorn	0.237864

2. Proximity centrality

close_centers = nx.closeness_centrality(G)
df_cc = pd.DataFrame(sorted(close_centers.items(), key=lambda x: x[1], reverse=True), columns=['Pokemon', 'Closeness centrality'])
df_cc.head(10)

index	Pokemon	Closeness centrality
0	Ulaos	0.648440
1	Talonflame	0.646345
2	Achilleine	0.628081
3	Amoonguss	0.612691
4	Windy	0.594483
5	Dusclops	0.572371
6	Laplace	0.562711
7	Oronge	0.551085
8	Pippi	0.545078
9	Ferrothorn	0.545078

3. Mediation centrality

between_centers = nx.betweenness_centrality(G)
df_bc = pd.DataFrame(sorted(between_centers.items(), key=lambda x: x[1], reverse=True), columns=['Pokemon', 'Betweenness centrality'])
df_bc.head(10)

index	Pokemon	Betweenness centrality
0	Ninetales	0.046012
1	Persian	0.036945
2	Tritodon	0.028901
3	Charizard	0.025690
4	Terrakion	0.021207
5	Glaceon	0.019807
6	Sandslash	0.018529
7	Achilleine	0.013435
8	Nyai King	0.011381
9	Heliolisk	0.009867

4. Eigenvector centrality

eigen_centers = nx.eigenvector_centrality_numpy(G)
df_ec = pd.DataFrame(sorted(eigen_centers.items(), key=lambda x: x[1], reverse=True), columns=['Pokemon', 'Eigen centrality'])
df_ec.head(10)

index	Pokemon	Eigen centrality
0	Ulaos	0.374252
1	Achilleine	0.362932
2	Talonflame	0.341690
3	Amoonguss	0.332555
4	Dusclops	0.297832
5	Windy	0.292263
6	Patch Ragon	0.261000
7	Ferrothorn	0.259066
8	Pippi	0.237895
9	Laplace	0.218322

5. Page rank

pageranks = nx.pagerank(G)
df_pr = pd.DataFrame(sorted(pageranks.items(), key=operator.itemgetter(1),reverse = True), columns=['Pokemon', 'Page Rank'])
df_pr.head(10)

Although the node positions are different from other centrality diagrams, they are the same network.

I tried to increase the label font of the node with large centrality.

I arranged the published rankings and each centrality index.

スクリーンショット2020-11-07224557.png

In the ranking narrowed down to the top 100 animals and the top 10 animals in the combined rank, it can be seen that Ulaos is higher than the significantly published rank in any index. (The feeling of being overrated)

From now on, we will use PageRank.

Network structure clustering

It's finally over. I would like to enter into the clustering of network structures, which is the subject of this time.

There are various clustering (community extraction) methods, such as those using the mediation centrality and eigenvector centrality derived earlier, information centrality, spin glass method, and random walk, which are not introduced this time.

This time, I have devoted a considerable amount of sentences to the derivation of centrality, so I would like to leave the execution of clustering by each centrality and the comparison of the results to another opportunity.

Hurry up this time. Clustering will be done using the method here (Paper, Implementation Library).

This method is an indicator of network density ([Modularity](https://en.wikipedia.org/wiki/%E3%83%A2%E3%82%B8%E3%83%A5%E3%83%] A9% E3% 83% AA% E3% 83% 86% E3% 82% A3))) is a method of dividing to the maximum, and it differs from k-means in that it is not necessary to specify the number of clusters in advance. there is.

Directed graphs cannot be used in this implementation, so we will convert them to undirected graphs.

#Convert directed graph to undirected graph
G2 = nx.Graph(G)

import community
partition = community.best_partition(G2)
partition2 = {}
for i in partition.keys():
    sub_dict = {'community' : partition[i]}
    partition2[i] = sub_dict

labels = dict([(i, str(i)) for i in range(nx.number_of_nodes(G2))])
labels2 = {}
for i in range(len(labels)):
    sub_dict = {'labels' : labels[i]}
    labels2[list(partition.keys())[i]] = sub_dict

nx.set_node_attributes(G2, labels2)
nx.set_node_attributes(G2, partition2)
nx.write_gml(G2, ".//community.gml")

pd.DataFrame.from_dict(labels2).T.to_csv('.//community_labels.csv')

Interpretation of cluster

As a result of clustering, 6 clusters were extracted. Let's take a look at each cluster.

df_pagerank = pd.DataFrame(sorted(pageranks.items(), key=operator.itemgetter(1),reverse = True), columns=['Pokemon', 'Page Rank'])
df_community = pd.concat([pd.DataFrame.from_dict(labels2).T, pd.DataFrame.from_dict(partition2).T], axis=1)
df_community = df_community.reset_index()
df_community.columns = ['Pokemon', 'label', 'community']

df_pagerank_community = pd.merge(left=df_pagerank, right=df_community, on = 'Pokemon')

Check the figure below of the Pokemon classified in each cluster.

df_pagerank_community.groupby('community').count()['Pokemon'].plot.bar(rot=0, alpha=0.75)

You can see that cluster 3 accounts for nearly 30% of the total.

Let's take a look at the contents of each cluster.

Cluster 0

df_pagerank_community[df_pagerank_community['community']==0].head(10)

index	Pokemon	Page Rank	label
13	Charizard	0.013742	0
18	Tritodon	0.007309	2
19	Sekitanzan	0.006604	32
20	Ninetales	0.006319	1
30	Sableye	0.002842	5
42	Weavile	0.001501	68
46	Sneasel	0.001235	55
53	Cobalion	0.001133	56
61	Leafeon	0.000935	30
69	Virizion	0.000789	69

Is it a sunny day centered on Lizardon Ninetales? Charizard is ranked first in the centrality, and Tritodon, which has excellent compatibility with Charizard, is ranked second.

Cluster 1

df_pagerank_community[df_pagerank_community['community']==1].head(10)

index	Pokemon	Page Rank	label	community
8	Pippi	0.033903	3	1
16	Polygon-Z	0.012646	27	1
17	Terrakion	0.009129	4	1
41	Persian	0.001544	33	1
80	Luxray	0.000630	64	1
81	Ennute	0.000629	92	1

Next, cluster 1 was these 6 animals. Polygon-Z, which can produce super-heat power, has been classified by pyroxene pippi, which has extremely high support performance, and adaptability dimax. The impression is that there are many versatile Pokemon that can be used at any party.

Cluster 2

df_pagerank_community[df_pagerank_community['community']==2].head(10)

index	Pokemon	Page Rank	label	community
7	Ferrothorn	0.038186	6	2
24	Sylveon	0.005511	20	2
28	Pelipper	0.003061	57	2
29	Wonoragon	0.002859	46	2
32	Kingdra	0.002468	49	2
33	Politoed	0.002321	47	2
35	Ludicolo	0.002233	50	2
37	Seismitoad	0.001736	58	2
38	Escavalier	0.001626	59	2
39	Weezing	0.001572	44	2

Cluster 2 is easy to understand, it is a rain pa that includes pelipper, kingdra, ludicolo and so on. The centrality of the ferrothorn, which is excellent in complementing the compatibility with the water type, is high. In addition, it is intuition that Pokemon such as Escavalier, which is not good at flame type, is composed of rain pa.

Cluster 3

df_pagerank_community[df_pagerank_community['community']==3].head(10)

index	Pokemon	Page Rank	label	community
0	Achilleine	0.107503	7	3
1	Talonflame	0.093996	12	3
4	Windy	0.056541	15	3
5	Patch Ragon	0.049160	16	3
14	Duraldon	0.013730	24	3
15	Oronge	0.013617	29	3
21	Amarjo	0.006167	17	3
26	Braviary	0.003980	23	3
27	Gengar	0.003799	43	3
34	Pixie	0.002254	28	3

Cluster 3 seems to be concentrated in the top meta of the environment. If you look at the party construction considerations that are rolling on the net, you'll often see combinations of Achilleine, Talonflame, and Patchragon as successful constructions, so the Pokemon who had a great influence in the Season 11 double environment I guess there are many.

Cluster 4

df_pagerank_community[df_pagerank_community['community']==4].head(10)

index	Pokemon	Page Rank	label	community
2	Amoonguss	0.083762	8	4
3	Dusclops	0.061276	9	4
9	Brim on	0.026956	25	4
11	Rhyperior	0.016228	21	4
12	Dadarin	0.014804	26	4
22	Raichu	0.006124	13	4
25	Garula	0.005487	19	4
31	rattle	0.002758	39	4
40	Slowbro	0.001563	40	4
45	Stringer	0.001322	82	4

This cluster is also very easy to understand. Dusclops and Brimon, who act as trill starters, Rhyperior, Dadarin, and Marowak, who act as trill attackers, and Amoonguss, who act as supporters, are classified.

Looking at the centrality, we can see that the Amoonguss play a very important role.

Cluster 5

df_pagerank_community[df_pagerank_community['community']==5].head(10)

index	Pokemon	Page Rank	label	community
6	Ulaos	0.045935	10	5
10	Laplace	0.024565	22	5
23	Kuwawa	0.005706	14	5
36	Goodra	0.002140	88	5
50	Noivern	0.001201	83	5
55	Blastoise	0.001056	11	5
63	Mahip	0.000880	94	5
90	Togedemaru	0.000463	93	5

The last cluster is these 8 animals. What kind of gathering are these Pokemon? It was difficult to interpret with my knowledge set, so I would love to hear from you in the comments.

Visualization

Finally, visualize the network with Cytoscape, which was confirmed how to use it last time.

Load community.gml from ** File> Import> Network from File ** and ** Import Table from File ** at the top (see figure below)

スクリーンショット2020-11-05021410.png

From, load community_labels.csv and set the displayed dialog box as follows.

Note that the red part needs to be changed from the default.

スクリーンショット2020-11-07203853.png

After that, just change the shape and color for each cluster by making full use of ** Continuous Mapping ** from the ** Style tab **.

スクリーンショット2020-11-07204346.png

I tried to visualize the network with the font color as the cluster and the font size as the page rank.

community.gml.png

Next time, I would like to search for the best clustering method for this data.

See you again.

Source, etc.

Cytoscape.org
Fast unfolding of communities in large networks -[Modularity](https://ja.wikipedia.org/wiki/%E3%83%A2%E3%82%B8%E3%83%A5%E3%83%A9%E3%83%AA%E3 % 83% 86% E3% 82% A3)

[PYTHON] Pokemon x Data Science (3) --Thinking about Pokemon Sword Shield Party Construction from Network Analysis Where is the center of the network?

What I want to achieve this time

Center in the network

1. Degree centrality

2. Proximity centrality, eccentricity centrality

3. Mediation centrality

4. Eigenvector centrality

5. Page rank

Derivation of centrality

1. Degree centrality

2. Proximity centrality

3. Mediation centrality

4. Eigenvector centrality

5. Page rank

Network structure clustering

Interpretation of cluster

Cluster 0

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Cluster 5

Visualization

Source, etc.