[PYTHON] Machine learning learned with Pokemon

Introduction

Last month, Pokemon Sword Shield was released. By the way, have you ever played Pokemon? As anyone who has played Pokemon knows, Pokemon has stats consisting of HP, Kogeki, Bougyo, Tokukou, Tokubo, and Quickness. It can be said that the higher the ability value of a Pokemon, the stronger the Pokemon. The ability value is calculated from three values: race value, individual value, and effort value. (The calculation formula is written below) ** Race value ** is the value given for each type of Pokemon. ** Individual value ** is a value given to each individual. It shows that the same Pokemon has different strengths. ** Effort value ** is an acquired value. The individual value is determined at birth, while the effort value can be increased by battle. This time, I would like to judge the type of Pokemon from the race value with python.

</ span> ・ HP ability value = (race value x 2 + individual value + effort value ÷ 4) x level ÷ 100 + level + 10 ・ Ability score other than HP = (race value x 2 + individual value + effort value ÷ 4) x level ÷ 100 + 5} x personality correction


Development environment

--CPU: 8th Generation 1.4GHz Quad Core Intel Core i5 Processor

First thing I did

When I searched for "Pokemon Machine Learning", there was a site that was doing something similar, so I used it as a reference. https://www.hands-lab.com/tech/entry/3991.html On this site, I was trying to determine whether it was a water type from the race value, so I implemented it with copy and paste for the time being. I thought it was a success because it was judged with an accuracy of ** 85.3% **, but in reality, only "Lucky" and "Blissey", which are not water types, were judged to be water types.

Now let's sort out the situation. There are 909 types of Pokemon, and 123 types of water-type Pokemon. There are 785 types of Pokemon that are not water type. Here, let's assume a model that determines that it is not a water type no matter what race value is entered. The correct answer rate for this model is 785/909 x 100 = ** 86.5 [%] **.

In other words, in the problem of binary classification, we can see that the result is strange unless the number of samples of the two target classifications is the same.

What I did next

Based on my reflection, I made the number of samples of the two objects I want to classify about the same. This time, I would like to create a model that determines whether it is a steel type or an electric type. (Steel type: 58, Denki type: 60) This time, Pokemon that have a steel and a steel type like a magneton were counted as a steel type. Pokemon data was borrowed from here .

# %%
import pandas as pd
import codecs
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

with codecs.open("data/pokemon_status.csv", "r", "Shift-JIS", "ignore") as file:
    df = pd.read_table(file, delimiter=",")

df.info()


# %%
metal1 = df[df['Type 1'] == "Steel"]
metal2 = df[df['Type 2'] == "Steel"]
metal = pd.concat([metal1, metal2])
print("Steel type pokemon: %d animals" % len(metal))

elec1 = df[df['Type 1'] == "Denki"]
elec2 = df[df['Type 2'] == "Denki"]
elec = pd.concat([elec1, elec2])
print("Electric type Pokemon: %d animals" % len(elec))


def type_to_num(p_type):
    if p_type == "Steel":
        return 0
    else:
        return 1


pokemon_m_e = pd.concat([metal, elec], ignore_index=True)
type1 = pokemon_m_e["Type 1"].apply(type_to_num)
type2 = pokemon_m_e["Type 2"].apply(type_to_num)
pokemon_m_e["type_num"] = type1*type2
pokemon_m_e.head()


# %%
X = pokemon_m_e.iloc[:, 7:13].values
y = pokemon_m_e["type_num"].values

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=0)
lr = LogisticRegression(C=1.0)
lr.fit(X_train, y_train)


# %%
print("score for train data: %.3f" % lr.score(X_train, y_train))
print("score for test data: %.3f" % lr.score(X_test, y_test))


# %%
i = 0
error1 = 0
success1 = 0
error2 = 0
success2 = 0
print("[List of Pokemon judged to be steel type]")
print("----------------------------------------")
print("")
while i < len(pokemon_m_e):
    y_pred = lr.predict(X[i].reshape(1, -1))
    if y_pred == 0:
        print(pokemon_m_e.loc[i, ["Pokemon name"]])
        if pokemon_m_e.loc[i, ["type_num"]].values == 0:
            success1 += 1
            print("Steel type")
            print("")
        else:
            error1 += 1
            print("Not a steel type")
            print("")
    else:
        if pokemon_m_e.loc[i, ["type_num"]].values == 0:
            error2 += 1
        else:
            success2 += 1
    i += 1
print("----------------------------------------")
print("Number of Pokemon that are correctly judged to be a steel type: %d animals" % success1)
print("Number of Pokemon that are correctly judged to be electric type: %d animals" % success2)
print("Number of Pokemon that were mistakenly identified as a steel type: %d animals" % error1)
print("Number of Pokemon that were mistakenly identified as electric type: %d animals" % error2)
print("")
    

Execution result Score for train data: 0.732 score for test data: 0.861

Number of Pokemon that were correctly judged to be a steel type: 48 Number of Pokemon that were correctly judged to be electric type: 43 Number of Pokemon judged to be steel type but not steel type: 13 Number of Pokemon that were not judged to be steel type even though they were steel type: 14

Surprisingly, it was judged correctly, so I think it was generally successful. Rotom was judged to be a steel type (laughs).

What I did more

In the above example, the electric type and the steel type were compared. There are 18 types of Pokemon in all, but I would like to try which combination gives the best judgment accuracy.

# %%
import pandas as pd
import codecs
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

with codecs.open("data/pokemon_status.csv", "r", "Shift-JIS", "ignore") as file:
    df = pd.read_table(file, delimiter=",")

df.info()


# %%
def lr_model_pokemon(type1, type2, test_size=0.3, random_state=0, C=1.0):
    df_type1_1 = df[df['Type 1'] == type1]
    df_type2_1 = df[df['Type 2'] == type1]
    df_type_1 = pd.concat([df_type1_1, df_type2_1])

    df_type1_2 = df[df['Type 1'] == type2]
    df_type2_2 = df[df['Type 2'] == type2]
    df_type_2 = pd.concat([df_type1_2, df_type2_2])

    def type_to_num(p_type):
        if p_type == type1:
            return 0
        else:
            return 1

    pokemon_concat = pd.concat([df_type_1, df_type_2], ignore_index=True)
    type_num1 = pokemon_concat["Type 1"].apply(type_to_num)
    type_num2 = pokemon_concat["Type 2"].apply(type_to_num)
    pokemon_concat["type_num"] = type_num1 * type_num2

    X = pokemon_concat.iloc[:, 7:13].values
    y = pokemon_concat["type_num"].values

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=test_size, random_state=random_state)
    lr = LogisticRegression(C=C)
    lr.fit(X_train, y_train)

    return [lr.score(X_train, y_train), lr.score(X_test, y_test)]


# %%
max_score_train = 0
max_score_test = 0
train_type1 = ""
test_type1 = ""
train_type2 = ""
test_type2 = ""
type_list = ["Kusa", "Fire", "Mizu", "insect", "normal", "Evil", "Iwa", "Steel",
             "Denki", "ghost", "Dragon", "Esper", "Fighting", "Doku", "Fairy", "Jimen", "flight", "Ice"]

for type1 in type_list:
    for type2 in type_list:
        if type1 == type2:
            continue
        score = lr_model_pokemon(type1=type1, type2=type2)
        if (score[0] >= max_score_train):
            max_score_train = score[0]
            train_type1 = type1
            train_type2 = type2
        if (score[1] >= max_score_test):
            max_score_test = score[1]
            test_type1 = type1
            test_type2 = type2

print("%s, %When s, the score for training data is maximized: score = %.3f" %
      (train_type1, train_type2, max_score_train))
print("%s, %When s, the score for the test data is maximized: score = %.3f" %
      (test_type1, test_type2, max_score_test))

Execution result Steel, normal, maximum score for training data: score = 0.942 Steel, normal, maximizes score for test data: score = 0.962

The accuracy of the model that distinguishes between the steel type and the normal type seems to be the highest. Now, let's actually see what kind of judgment is made.

# %%
def poke_predict(type1, type2):
    type1_1 = df[df['Type 1'] == type1]
    type2_1 = df[df['Type 2'] == type1]
    type_1 = pd.concat([type1_1, type2_1])
    print("%s type pokemon: %d animals" % (type1, len(type_1)))

    type1_2 = df[df['Type 1'] == type2]
    type2_2 = df[df['Type 2'] == type2]
    type_2 = pd.concat([type1_2, type2_2])
    print("%s type pokemon: %d animals" % (type2, len(type_2)))

    def type_to_num(p_type):
        if p_type == type1:
            return 0
        else:
            return 1

    poke_concat = pd.concat([type_1, type_2], ignore_index=True)
    type1_c = poke_concat["Type 1"].apply(type_to_num)
    type2_c = poke_concat["Type 2"].apply(type_to_num)
    poke_concat["type_num"] = type1_c*type2_c
    poke_concat.head()

    X = poke_concat.iloc[:, 7:13].values
    y = poke_concat["type_num"].values

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.3, random_state=0)
    lr = LogisticRegression(C=1.0)
    lr.fit(X_train, y_train)

    i = 0
    error1 = 0
    success1 = 0
    error2 = 0
    success2 = 0
    print("")
    print("[%List of Pokemon judged to be s type]" % type1)
    print("----------------------------------------")
    print("")
    while i < len(poke_concat):
        y_pred = lr.predict(X[i].reshape(1, -1))
        if y_pred == 0:
            print(poke_concat.loc[i, ["Pokemon name"]])
            if poke_concat.loc[i, ["type_num"]].values == 0:
                success1 += 1
                print("%s type" % type1)
                print("")
            else:
                error1 += 1
                print("%Not s type" % type1)
                print("")
        else:
            if poke_concat.loc[i, ["type_num"]].values == 0:
                error2 += 1
            else:
                success2 += 1
        i += 1
    print("----------------------------------------")
    print("Correctly%Number of Pokemon judged to be s type: %d animals" % (type1, success1))
    print("Correctly%Number of Pokemon judged to be s type: %d animals" % (type2, success2))
    print("Accidentally%Number of Pokemon judged to be s type: %d animals" % (type1, error1))
    print("Accidentally%Number of Pokemon judged to be s type: %d animals" % (type2, error2))
    print("")


# %%
poke_predict("Steel", "normal")

Execution result Steel-type Pokemon: 58 Normal type Pokemon: 116

Number of Pokemon that were correctly judged to be a steel type: 50 Number of Pokemon correctly judged to be normal type: 115 Number of Pokemon that were mistakenly identified as a steel type: 1 Number of Pokemon that were mistakenly identified as normal type: 8

Although there is a difference in the number of samples, the accuracy of 94.8% can be said to be quite good. From this result, it can be said that the characteristics of the race value are different between the normal type and the steel type.

At the end

I'm a beginner less than a week after I started learning machine learning, but I think I was able to think deeply. If you have any wrong thoughts in this article, I would appreciate it if you could point them out.

Recommended Posts

Machine learning learned with Pokemon
Machine learning starting from scratch (machine learning learned with Kaggle)
Machine learning with Python! Preparation
Machine learning Minesweeper with PyTorch
Beginning with Python machine learning
Pokemon machine learning Nth decoction
Try machine learning with Kaggle
Generate Pokemon with Deep Learning
Machine learning
[Python] Object-oriented programming learned with Pokemon
I tried machine learning with liblinear
Machine learning with python (1) Overall classification
Perceptron learning experiment learned with Python
Try machine learning with scikit-learn SVM
Quantum-inspired machine learning with tensor networks
Get started with machine learning with SageMaker
"Scraping & machine learning with Python" Learning memo
Predict power demand with machine learning Part 2
Amplify images for machine learning with python
Machine learning imbalanced data sklearn with k-NN
Machine learning with python (2) Simple regression analysis
A story about machine learning with Kyasuket
Algorithm learned with Python 2nd: Vending machine
[Shakyo] Encounter with Python for machine learning
Machine learning with Pytorch on Google Colab
[Memo] Machine learning
Machine learning classification
Build AI / machine learning environment with Python
Machine Learning sample
Source code of sound source separation (machine learning practice series) learned with Python
[Python] Easy introduction to machine learning with python (SVM)
Machine learning starting with Python Personal memorandum Part2
Machine learning starting with Python Personal memorandum Part1
[Python] Collect images with Icrawler for machine learning [1000 images]
Looking back on learning with Azure Machine Learning Studio
Overview of machine learning techniques learned from scikit-learn
I started machine learning with Python Data preprocessing
Build a Python machine learning environment with a container
Machine learning tutorial summary
Learning Python with ChemTHEATER 03
"Object-oriented" learning with python
About machine learning overfitting
Learning Python with ChemTHEATER 05-1
Machine learning ⑤ AdaBoost Summary
Machine Learning: Supervised --AdaBoost
Machine learning logistic regression
Machine learning support vector machine
Studying Machine Learning ~ matplotlib ~
Learning Python with ChemTHEATER 02
Machine learning linear regression
Machine learning course memo
Machine learning library dlib
Learning Python with ChemTHEATER 01
Machine learning (TensorFlow) + Lotto 6
Somehow learn machine learning
Machine learning library Shogun
Machine learning rabbit challenge
Introduction to machine learning
Machine Learning: k-Nearest Neighbors
What is machine learning?
What I learned about AI / machine learning using Python (1)