[PYTHON] Pokemon classification by topic model

Introduction

The article by haripo's LDA for Pokemon analysis was interesting, so Although it is the second decoction, I classified Pokemon with the topic model.

Structure of this article

Topic model

Please refer to the article I wrote earlier for the topic model.

Pokemon classification

Applying the terms used in the topic model to Pokemon classification It looks like the table below.

Topic model Pokemon classification
documents Pokémon
topic type
word Move

Pokemon have types, and types affect the skills that can be learned. For example, water-type Pokemon are called "Naminori" and "Awa". I tend to learn water-type techniques. It seems that this tendency can be used for classification using the techniques to be remembered as observation data.

Results and analysis

This time, we estimated the parameters using variational Bayesian inference. I classified Pokemon.

Determine the distribution based on the estimated parameters The top 10 Pokemon of each topic are summarized. It seems that the same type of Pokemon are gathering.

pokemon_lda

Next, we have summarized the top 10 techniques that have a high probability of appearing in each topic. I can't put all of them, so I'll put 3 topics. From top to bottom, it's like fighting, flying, and dragon skills.

# probability, move

# topic 0
0.038194060809852150,Kiai punch
0.037835195362798050,Squeeze
0.036841611052444170,Glow punch
0.034094062097912610,Instead
0.031582047497348980,Kiai Dama
0.030022570931390366,Kamiari punch
0.028445813433849287,Ketaguri
0.025004928499331930,Fire punch
0.023780587984568276,counter
0.021945692094110280,Gansei Fuji

# topic 4
0.033900215309604030,fly through a sky
0.030482342390286497,Peck
0.028591087641639673,Steel wings
0.027435959356401675,Splash Yasume
0.027380458031433918,Okaze
0.025788738790993984,Clearly
0.023034317940404975,Nepuu
0.022523589807169140,Swallow
0.022254501055455754,Godbird
0.020131867462295738,Denkou Sekka

# topic 11
0.049065350322072170,Gekirin
0.041774262273487610,Kamikaku
0.041624388294984890,How about Ryu
0.037855463232992870,Barking
0.025959566718192560,Dragon claw
0.024452559013954666,Biting
0.023117951513882520,Meteor shower
0.022165850178318302,Ryu no Ibuki
0.021211659635587490,Dragon tail
0.020565024347301973,Iron tail

By the way, assuming that all the techniques appear with equal probability, It will be $ 0.001612903 $.

A little more analysis

Topic 1 is a collection of Eevee systems.

001
# topic 1
0.043147518977559340,Give
0.041203419444248310,Special
0.033292408675975230,Katakiuchi
0.033029453562122160,To want
0.030529522626056543,Sift
0.025551829416613884,Hyper Voice
0.024972809947129540,wag its tail
0.024818145340899777,Echo voice
0.023821039582134246,Nakigoe
0.023117882965061936,Tedasuke

I'm not familiar with Pokemon these days, so I don't know if these are unique techniques for Eevee. Some knowledge is required to analyze the classification results.

Topic 13 seems to have failed to classify.

012
# topic 12
0.057367839760812930,snore
0.053966900730742826,Duster duster
0.045267075341040960,Per guy
0.042824715616878280,No
0.042506636983195300,Karagenki
0.042225237649753020,From the power of secret
0.041397727253613630,Pounding
0.035804309317892184,protect
0.034668594832704050,Rinsho
0.021632999482884614,Swallow

In the case of Pokemon, it was relatively easy to evaluate the classification results. However, if it is difficult to define a topic, It seems difficult to evaluate whether the classification result is good or bad. Also, it was difficult to see and analyze the results every time I turned the code.

in conclusion

Pokemon can be classified by the topic model. It was fun because the results were better than expected. The source code will not be released.

Recommended Posts

Pokemon classification by topic model
Learn the basics of document classification by natural language processing, topic model
Classify machine learning related information by topic model
Continuous space topic model implementation
Language prediction model by TensorFlow
Manipulate topic models ~ Interactive Topic Model ~
Markov switching model by Python
Classification / regression by stacking (scikit-learn)
Simple classification model with neural network
Estimated Probit model by Binary Response model
Model generated by Variational Autoencoder (VAE)