-Pokemon x Data Science (1) --I analyzed the rank battle data of Pokemon Sword Shield and visualized it on Tableau -Pokemon x Data Science (2) --Trial version of thinking about party construction of Pokemon sword shield from network analysis -[This time] Pokemon x Data Science (3) --Thinking about party construction of Pokemon sword shield from network analysis Where is the center of the network
Hello, we will deal continue to graph theory, a network analysis in the previous article](https://qiita.com/b_aka/items/9020e3237ff1a3e676e4) this time.
In the previous article, we visualized the network of Pokemon Sword Shield Rank Battle parties. This time, I will actually start the analysis.
The code and data used this time can be found in This Github repository.
The full code is here [https://github.com/moxak/pokemon-rankbattle-network-analysis/blob/master/002.ipynb)
As the title suggests, I would like to cluster each node of the Pokemon Sword Shield Party Building Network. We also want to capture important nodes by introducing the concept of centrality before clustering.
If you can do something like this, you have achieved your goal.
There is a centrality in network theory (graph theory).
Centrality is an indicator for assessing and comparing the importance of each vertex in the network.
[Network Analysis 2nd Edition Learning with R Data Science](https://www.amazon.co.jp/%E3%83%8D%E3%83%83%E3%83%88%E3%83%AF % E3% 83% BC% E3% 82% AF% E5% 88% 86% E6% 9E% 90-% E7% AC% AC2% E7% 89% 88-R% E3% 81% A7% E5% AD% A6% E3% 81% B6% E3% 83% 87% E3% 83% BC% E3% 82% BF% E3% 82% B5% E3% 82% A4% E3% 82% A8% E3% 83% B3% E3% 82% B9-% E9% 88% B4% E6% 9C% A8-% E5% 8A% AA / dp / 4320113152 / ref = sr_1_4? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3 % 82% AB% E3% 83% 8A & dchild = 1 & keywords =% E3% 83% 8D% E3% 83% 83% E3% 83% 88% E3% 83% AF% E3% 83% BC% E3% 82% AF% From E5% 88% 86% E6% 9E% 90 & sr = 8-4)
This is an attempt to mathematically derive how central = ** important ** each node is in the network.
This time, I would like to use this theory to calculate the importance of each node in the network.
However, there are actually many types of this centrality.
The first centrality is degree centrality.
Degree is the number of edges that a node has. The number of orders becomes central as it is.
The more nodes you have connected, the more important it is! While it is a basic centrality based on the idea, I feel that the results often deviate from intuition.
Next are proximity centrality and eccentricity, which derive centrality from the distance from other nodes.
The closer you are to the center of the network, the more important it is! That's the idea.
Proximity centrality takes the reciprocal of the total distance from its own node to other nodes, and eccentricity takes the reciprocal of the maximum distance from its own node to other nodes.
Since the results of both using distance are very similar, I would like to use proximity centrality this time.
The most commonly used (I think) is mediated centrality.
Simply put, the idea is that the more frequently located on the shortest path, the more important it is (relayability).
A node with a high mediation center in a community network means that it is in a position where it is not possible to access another community without going through that node, which seems to be intuitively important. I hope you can understand that.
The eigenvector centrality is very different from the four centralities introduced so far, and it is a centrality that introduces the idea of "which node is connected to".
Incorporating the idea that "nodes connected to important nodes are more important", we repeat the process of adding the centrality of others connected to ourselves, and make the converged value the centrality.
It's the centrality devised by Google founders Larry Page and Sergey Brin.
The basic idea is the same as eigenvector centrality. To understand what is different, we need to know the problem of eigenvector centrality.
Suppose you have a node in your network that has no edges from other nodes. The centrality of this node is of course 0. That's fine so far, but the next one is a bit tricky. Suppose you have a node i that is connected only to such a node. Naturally, the centrality of i does not change from the connected nodes, so the centrality of i is also 0. I think this is counterintuitive.
Also, let's say you have a node i connected to a node j that has a tremendous amount of centrality. In the idea of eigenvector centrality, the centrality of node j transitions to node i, but node i is just one of many nodes to which node j attaches an edge. Should we transition all of the centrality of node j to node i?
PageRank is a centrality index that solves these problems to some extent.
It became the basis of Google's search algorithm and is still used to calculate the impact factor of papers.
The data used to derive the centrality is the "Pokemon ranking to be adopted together" data and the adoption ranking data of the rank battle of Pokemon sword shield from the previous time. We will analyze the network consisting of the top 100 animals in the recruitment ranking.
df = pd.read_csv(FILEPATH_TEMOTI_POKEMON), encoding='utf-8')
df_rank = pd.read_csv(FILEPATH_ADO_RANK, encoding='utf-8')
df.columns = ['Season', 'Rule', 'Pokemon_From', 'Pokemon_To', 'Weight']
df['Weight'] = 10-df['Weight']
df_season11_double = df[(df['Season']==11)&(df['Rule']=='Double')]
df_season11_double = df_season11_double.drop(['Season', 'Rule'], axis=1)
#Limited to the top 100 recruitment rates
df_season11_double = df_season11_double[df_season11_double['Pokemon_From'].isin(list(df_rank['Pokemon'])[:100])]
df_season11_double = df_season11_double[df_season11_double['Pokemon_To'].isin(list(df_rank['Pokemon'])[:100])]
df_season11_double.to_csv(OUTPUT_FILEPATH', index=False)
df_season11_double
index | Pokemon_From | Pokemon_To | Weight |
---|---|---|---|
91394 | Charizard | Ninetales | 9 |
91395 | Charizard | Tritodon | 8 |
91396 | Charizard | Pippi | 7 |
91397 | Charizard | Terrakion | 6 |
91398 | Charizard | Sableye | 5 |
504 rows × 3 columns
Create a network from the data created above.
import networkx as nx
network_np = df_season11_double.values
G = nx.DiGraph()
G.add_weighted_edges_from(network_np)
degree_centers = nx.degree_centrality(G)
df_dc = pd.DataFrame(sorted(degree_centers.items(), key=lambda x: x[1], reverse=True), columns=['Pokemon', 'Degree centrality'])
df_dc.head(10)
index | Pokemon | Degree centrality |
---|---|---|
0 | Ulaos | 0.500000 |
1 | Talonflame | 0.490291 |
2 | Achilleine | 0.451456 |
3 | Amoonguss | 0.419903 |
4 | Windy | 0.359223 |
5 | Dusclops | 0.308252 |
6 | Oronge | 0.293689 |
7 | Laplace | 0.291262 |
8 | Charizard | 0.269417 |
9 | Ferrothorn | 0.237864 |
close_centers = nx.closeness_centrality(G)
df_cc = pd.DataFrame(sorted(close_centers.items(), key=lambda x: x[1], reverse=True), columns=['Pokemon', 'Closeness centrality'])
df_cc.head(10)
index | Pokemon | Closeness centrality |
---|---|---|
0 | Ulaos | 0.648440 |
1 | Talonflame | 0.646345 |
2 | Achilleine | 0.628081 |
3 | Amoonguss | 0.612691 |
4 | Windy | 0.594483 |
5 | Dusclops | 0.572371 |
6 | Laplace | 0.562711 |
7 | Oronge | 0.551085 |
8 | Pippi | 0.545078 |
9 | Ferrothorn | 0.545078 |
between_centers = nx.betweenness_centrality(G)
df_bc = pd.DataFrame(sorted(between_centers.items(), key=lambda x: x[1], reverse=True), columns=['Pokemon', 'Betweenness centrality'])
df_bc.head(10)
index | Pokemon | Betweenness centrality |
---|---|---|
0 | Ninetales | 0.046012 |
1 | Persian | 0.036945 |
2 | Tritodon | 0.028901 |
3 | Charizard | 0.025690 |
4 | Terrakion | 0.021207 |
5 | Glaceon | 0.019807 |
6 | Sandslash | 0.018529 |
7 | Achilleine | 0.013435 |
8 | Nyai King | 0.011381 |
9 | Heliolisk | 0.009867 |
eigen_centers = nx.eigenvector_centrality_numpy(G)
df_ec = pd.DataFrame(sorted(eigen_centers.items(), key=lambda x: x[1], reverse=True), columns=['Pokemon', 'Eigen centrality'])
df_ec.head(10)
index | Pokemon | Eigen centrality |
---|---|---|
0 | Ulaos | 0.374252 |
1 | Achilleine | 0.362932 |
2 | Talonflame | 0.341690 |
3 | Amoonguss | 0.332555 |
4 | Dusclops | 0.297832 |
5 | Windy | 0.292263 |
6 | Patch Ragon | 0.261000 |
7 | Ferrothorn | 0.259066 |
8 | Pippi | 0.237895 |
9 | Laplace | 0.218322 |
pageranks = nx.pagerank(G)
df_pr = pd.DataFrame(sorted(pageranks.items(), key=operator.itemgetter(1),reverse = True), columns=['Pokemon', 'Page Rank'])
df_pr.head(10)
I tried to increase the label font of the node with large centrality.
I arranged the published rankings and each centrality index.
In the ranking narrowed down to the top 100 animals and the top 10 animals in the combined rank, it can be seen that Ulaos is higher than the significantly published rank in any index. (The feeling of being overrated)
From now on, we will use PageRank.
It's finally over. I would like to enter into the clustering of network structures, which is the subject of this time.
There are various clustering (community extraction) methods, such as those using the mediation centrality and eigenvector centrality derived earlier, information centrality, spin glass method, and random walk, which are not introduced this time.
This time, I have devoted a considerable amount of sentences to the derivation of centrality, so I would like to leave the execution of clustering by each centrality and the comparison of the results to another opportunity.
Hurry up this time. Clustering will be done using the method here (Paper, Implementation Library).
This method is an indicator of network density ([Modularity](https://en.wikipedia.org/wiki/%E3%83%A2%E3%82%B8%E3%83%A5%E3%83%] A9% E3% 83% AA% E3% 83% 86% E3% 82% A3))) is a method of dividing to the maximum, and it differs from k-means in that it is not necessary to specify the number of clusters in advance. there is.
Directed graphs cannot be used in this implementation, so we will convert them to undirected graphs.
#Convert directed graph to undirected graph
G2 = nx.Graph(G)
import community
partition = community.best_partition(G2)
partition2 = {}
for i in partition.keys():
sub_dict = {'community' : partition[i]}
partition2[i] = sub_dict
labels = dict([(i, str(i)) for i in range(nx.number_of_nodes(G2))])
labels2 = {}
for i in range(len(labels)):
sub_dict = {'labels' : labels[i]}
labels2[list(partition.keys())[i]] = sub_dict
nx.set_node_attributes(G2, labels2)
nx.set_node_attributes(G2, partition2)
nx.write_gml(G2, ".//community.gml")
pd.DataFrame.from_dict(labels2).T.to_csv('.//community_labels.csv')
As a result of clustering, 6 clusters were extracted. Let's take a look at each cluster.
df_pagerank = pd.DataFrame(sorted(pageranks.items(), key=operator.itemgetter(1),reverse = True), columns=['Pokemon', 'Page Rank'])
df_community = pd.concat([pd.DataFrame.from_dict(labels2).T, pd.DataFrame.from_dict(partition2).T], axis=1)
df_community = df_community.reset_index()
df_community.columns = ['Pokemon', 'label', 'community']
df_pagerank_community = pd.merge(left=df_pagerank, right=df_community, on = 'Pokemon')
Check the figure below of the Pokemon classified in each cluster.
df_pagerank_community.groupby('community').count()['Pokemon'].plot.bar(rot=0, alpha=0.75)
You can see that cluster 3 accounts for nearly 30% of the total.
Let's take a look at the contents of each cluster.
df_pagerank_community[df_pagerank_community['community']==0].head(10)
index | Pokemon | Page Rank | label | community |
---|---|---|---|---|
13 | Charizard | 0.013742 | 0 | 0 |
18 | Tritodon | 0.007309 | 2 | 0 |
19 | Sekitanzan | 0.006604 | 32 | 0 |
20 | Ninetales | 0.006319 | 1 | 0 |
30 | Sableye | 0.002842 | 5 | 0 |
42 | Weavile | 0.001501 | 68 | 0 |
46 | Sneasel | 0.001235 | 55 | 0 |
53 | Cobalion | 0.001133 | 56 | 0 |
61 | Leafeon | 0.000935 | 30 | 0 |
69 | Virizion | 0.000789 | 69 | 0 |
Is it a sunny day centered on Lizardon Ninetales? Charizard is ranked first in the centrality, and Tritodon, which has excellent compatibility with Charizard, is ranked second.
df_pagerank_community[df_pagerank_community['community']==1].head(10)
index | Pokemon | Page Rank | label | community |
---|---|---|---|---|
8 | Pippi | 0.033903 | 3 | 1 |
16 | Polygon-Z | 0.012646 | 27 | 1 |
17 | Terrakion | 0.009129 | 4 | 1 |
41 | Persian | 0.001544 | 33 | 1 |
80 | Luxray | 0.000630 | 64 | 1 |
81 | Ennute | 0.000629 | 92 | 1 |
Next, cluster 1 was these 6 animals. Polygon-Z, which can produce super-heat power, has been classified by pyroxene pippi, which has extremely high support performance, and adaptability dimax. The impression is that there are many versatile Pokemon that can be used at any party.
df_pagerank_community[df_pagerank_community['community']==2].head(10)
index | Pokemon | Page Rank | label | community |
---|---|---|---|---|
7 | Ferrothorn | 0.038186 | 6 | 2 |
24 | Sylveon | 0.005511 | 20 | 2 |
28 | Pelipper | 0.003061 | 57 | 2 |
29 | Wonoragon | 0.002859 | 46 | 2 |
32 | Kingdra | 0.002468 | 49 | 2 |
33 | Politoed | 0.002321 | 47 | 2 |
35 | Ludicolo | 0.002233 | 50 | 2 |
37 | Seismitoad | 0.001736 | 58 | 2 |
38 | Escavalier | 0.001626 | 59 | 2 |
39 | Weezing | 0.001572 | 44 | 2 |
Cluster 2 is easy to understand, it is a rain pa that includes pelipper, kingdra, ludicolo and so on. The centrality of the ferrothorn, which is excellent in complementing the compatibility with the water type, is high. In addition, it is intuition that Pokemon such as Escavalier, which is not good at flame type, is composed of rain pa.
df_pagerank_community[df_pagerank_community['community']==3].head(10)
index | Pokemon | Page Rank | label | community |
---|---|---|---|---|
0 | Achilleine | 0.107503 | 7 | 3 |
1 | Talonflame | 0.093996 | 12 | 3 |
4 | Windy | 0.056541 | 15 | 3 |
5 | Patch Ragon | 0.049160 | 16 | 3 |
14 | Duraldon | 0.013730 | 24 | 3 |
15 | Oronge | 0.013617 | 29 | 3 |
21 | Amarjo | 0.006167 | 17 | 3 |
26 | Braviary | 0.003980 | 23 | 3 |
27 | Gengar | 0.003799 | 43 | 3 |
34 | Pixie | 0.002254 | 28 | 3 |
Cluster 3 seems to be concentrated in the top meta of the environment. If you look at the party construction considerations that are rolling on the net, you'll often see combinations of Achilleine, Talonflame, and Patchragon as successful constructions, so the Pokemon who had a great influence in the Season 11 double environment I guess there are many.
df_pagerank_community[df_pagerank_community['community']==4].head(10)
index | Pokemon | Page Rank | label | community |
---|---|---|---|---|
2 | Amoonguss | 0.083762 | 8 | 4 |
3 | Dusclops | 0.061276 | 9 | 4 |
9 | Brim on | 0.026956 | 25 | 4 |
11 | Rhyperior | 0.016228 | 21 | 4 |
12 | Dadarin | 0.014804 | 26 | 4 |
22 | Raichu | 0.006124 | 13 | 4 |
25 | Garula | 0.005487 | 19 | 4 |
31 | rattle | 0.002758 | 39 | 4 |
40 | Slowbro | 0.001563 | 40 | 4 |
45 | Stringer | 0.001322 | 82 | 4 |
This cluster is also very easy to understand. Dusclops and Brimon, who act as trill starters, Rhyperior, Dadarin, and Marowak, who act as trill attackers, and Amoonguss, who act as supporters, are classified.
Looking at the centrality, we can see that the Amoonguss play a very important role.
df_pagerank_community[df_pagerank_community['community']==5].head(10)
index | Pokemon | Page Rank | label | community |
---|---|---|---|---|
6 | Ulaos | 0.045935 | 10 | 5 |
10 | Laplace | 0.024565 | 22 | 5 |
23 | Kuwawa | 0.005706 | 14 | 5 |
36 | Goodra | 0.002140 | 88 | 5 |
50 | Noivern | 0.001201 | 83 | 5 |
55 | Blastoise | 0.001056 | 11 | 5 |
63 | Mahip | 0.000880 | 94 | 5 |
90 | Togedemaru | 0.000463 | 93 | 5 |
The last cluster is these 8 animals. What kind of gathering are these Pokemon? It was difficult to interpret with my knowledge set, so I would love to hear from you in the comments.
Finally, visualize the network with Cytoscape, which was confirmed how to use it last time.
Load community.gml
from ** File> Import> Network from File ** and ** Import Table from File ** at the top (see figure below)
From, load community_labels.csv
and set the displayed dialog box as follows.
Note that the red part needs to be changed from the default.
After that, just change the shape and color for each cluster by making full use of ** Continuous Mapping ** from the ** Style tab **.
I tried to visualize the network with the font color as the cluster and the font size as the page rank.
Next time, I would like to search for the best clustering method for this data.
See you again.
-Pokemon x Data Science (1) --I analyzed the rank battle data of Pokemon Sword Shield and visualized it on Tableau -Pokemon x Data Science (2) --Trial version of thinking about party construction of Pokemon sword shield from network analysis
© 2020 Pokémon © 1995-2020 Nintendo / Creatures Inc./GAME FREAK inc. Pokemon, Pokemon, and Pokémon are registered trademarks of Nintendo, Creatures, and Game Freak.
Recommended Posts