The article by haripo's LDA for Pokemon analysis was interesting, so Although it is the second decoction, I classified Pokemon with the topic model.
Structure of this article
Please refer to the article I wrote earlier for the topic model.
Applying the terms used in the topic model to Pokemon classification It looks like the table below.
Topic model | Pokemon classification |
---|---|
documents | Pokémon |
topic | type |
word | Move |
Pokemon have types, and types affect the skills that can be learned. For example, water-type Pokemon are called "Naminori" and "Awa". I tend to learn water-type techniques. It seems that this tendency can be used for classification using the techniques to be remembered as observation data.
This time, we estimated the parameters using variational Bayesian inference. I classified Pokemon.
Determine the distribution based on the estimated parameters The top 10 Pokemon of each topic are summarized. It seems that the same type of Pokemon are gathering.
Next, we have summarized the top 10 techniques that have a high probability of appearing in each topic. I can't put all of them, so I'll put 3 topics. From top to bottom, it's like fighting, flying, and dragon skills.
# probability, move
# topic 0
0.038194060809852150,Kiai punch
0.037835195362798050,Squeeze
0.036841611052444170,Glow punch
0.034094062097912610,Instead
0.031582047497348980,Kiai Dama
0.030022570931390366,Kamiari punch
0.028445813433849287,Ketaguri
0.025004928499331930,Fire punch
0.023780587984568276,counter
0.021945692094110280,Gansei Fuji
# topic 4
0.033900215309604030,fly through a sky
0.030482342390286497,Peck
0.028591087641639673,Steel wings
0.027435959356401675,Splash Yasume
0.027380458031433918,Okaze
0.025788738790993984,Clearly
0.023034317940404975,Nepuu
0.022523589807169140,Swallow
0.022254501055455754,Godbird
0.020131867462295738,Denkou Sekka
# topic 11
0.049065350322072170,Gekirin
0.041774262273487610,Kamikaku
0.041624388294984890,How about Ryu
0.037855463232992870,Barking
0.025959566718192560,Dragon claw
0.024452559013954666,Biting
0.023117951513882520,Meteor shower
0.022165850178318302,Ryu no Ibuki
0.021211659635587490,Dragon tail
0.020565024347301973,Iron tail
By the way, assuming that all the techniques appear with equal probability, It will be $ 0.001612903 $.
Topic 1 is a collection of Eevee systems.
# topic 1
0.043147518977559340,Give
0.041203419444248310,Special
0.033292408675975230,Katakiuchi
0.033029453562122160,To want
0.030529522626056543,Sift
0.025551829416613884,Hyper Voice
0.024972809947129540,wag its tail
0.024818145340899777,Echo voice
0.023821039582134246,Nakigoe
0.023117882965061936,Tedasuke
I'm not familiar with Pokemon these days, so I don't know if these are unique techniques for Eevee. Some knowledge is required to analyze the classification results.
Topic 13 seems to have failed to classify.
# topic 12
0.057367839760812930,snore
0.053966900730742826,Duster duster
0.045267075341040960,Per guy
0.042824715616878280,No
0.042506636983195300,Karagenki
0.042225237649753020,From the power of secret
0.041397727253613630,Pounding
0.035804309317892184,protect
0.034668594832704050,Rinsho
0.021632999482884614,Swallow
In the case of Pokemon, it was relatively easy to evaluate the classification results. However, if it is difficult to define a topic, It seems difficult to evaluate whether the classification result is good or bad. Also, it was difficult to see and analyze the results every time I turned the code.
Pokemon can be classified by the topic model. It was fun because the results were better than expected. The source code will not be released.
Recommended Posts