[PYTHON] I tried to make a ○ ✕ game using TensorFlow

Introduction

――So far, we have been doing so-called “supervised learning” using CNN and RNN. ――This time, I tried to easily make "reinforcement learning" like the famous AlphaGo. ――But it's just a little improvement for those who have already made it. ―― ○ ✕ The game seems to be officially called tic-tac-toe. (I did not know) ――The details are the same as the contents of the implementation so far, so I will not write it twice. ――Since it is not complicated, it is not slim or constant.

reference

8th Let's make AI for ○ × game with TensorFlow

environment

# OS/software/Library version
1 Mac OS X EI Capitan
2 Python 2.7 series
3 TensorFlow 1.2 system

Details

http://qiita.com/neriai/items/c0114af9c2eae627b6ce

Training data

I am using this as it is. https://github.com/sfujiwara/tictactoe-tensorflow/tree/master/data

Source code

ticktacktoo.py


#!/usr/local/bin/python
# -*- coding: utf-8 -*-

import os
import shutil
import numpy as np
import tensorflow as tf

def inference(squares_placeholder):

    #Create the first hidden layer
    with tf.name_scope('hidden1') as scope:
        hidden1 = tf.layers.dense(squares_placeholder, 32, activation=tf.nn.relu)

    #Create hidden layer 2nd layer
    with tf.name_scope('hidden2') as scope:
        hidden2 = tf.layers.dense(hidden1, 32, activation=tf.nn.relu)

    #Create high density layer
    with tf.name_scope('logits') as scope:
        logits = tf.layers.dense(hidden2, 3)

    #Normalization with softmax function
    with tf.name_scope('softmax') as scope:
        logits = tf.nn.softmax(logits)

    return logits

#error(loss)Train a learning model designed using error backpropagation based on
def loss(labels_placeholder, logits):

    cross_entropy = tf.losses.softmax_cross_entropy(
        onehot_labels=labels_placeholder,
        logits=logits,
        label_smoothing=1e-5
    )

    #Specify to display in TensorBoard
    tf.summary.scalar("cross_entropy", cross_entropy)

    return cross_entropy

#error(loss)Train a learning model designed using error backpropagation based on
def training(learning_rate, loss):

    #Like this function does all that
    train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss)

    return train_step

#Calculate the correct answer rate of the prediction result given by the learning model at inference
def accuracy(logits, labels):

    #Compare whether the prediction label and the correct label are equal. Returns True if they are the same
    correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))

    #boolean correct_Calculate the correct answer rate by changing prediction to float
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

    #Set to display on TensorBoard
    tf.summary.scalar("accuracy", accuracy)

    return accuracy

if __name__ == '__main__':

    np.random.seed(1)

    mat = np.loadtxt('/workspace/tictactoe/data.csv', skiprows=1, delimiter=",")

    ind_train = np.random.choice(5890, 4000, replace=False)
    ind_test = np.array([i for i in range(5890) if i not in ind_train])

    train_square = mat[ind_train, :-1]
    test_square = mat[ind_test, :-1]

    all_label = np.zeros([len(mat), 3])

    for i, j in enumerate(mat[:, -1]):
        if j == 1:
            # x win
            all_label[i][0] = 1.
        elif j == -1:
            # o win
            all_label[i][1] = 1.
        else:
            # draw
            all_label[i][2] = 1.

    train_label = all_label[ind_train]
    test_label = all_label[ind_test]

    with tf.Graph().as_default() as graph:

        tf.set_random_seed(0)

        #Tensor for inserting images(28*28*3(IMAGE_PIXELS)Any number of dimensional images(None)I have a minute)
        squares_placeholder = tf.placeholder(tf.float32, [None, 9])

        #Tensor to put a label(3(NUM_CLASSES)Any number of dimensional labels(None)Enter minutes)
        labels_placeholder = tf.placeholder(tf.float32, [None, 3])

        #Generate a model
        logits = inference(squares_placeholder)

        # loss()To calculate the loss
        loss = loss(labels_placeholder, logits)

        # training()To train and adjust the parameters of the learning model
        train_step = training(0.01, loss)

        #Accuracy calculation
        accuracy = accuracy(logits, labels_placeholder)

        #Ready to save
        saver = tf.train.Saver()

        #Creating a Session(TensorFlow calculations must be done in an absolute Session)
        sess = tf.Session()

        #Variable initialization(Initialize when starting Session)
        sess.run(tf.global_variables_initializer())

        #TensorBoard display settings(Tensor Board Declarative?)
        summary_step = tf.summary.merge_all()

        # train_Specify the path to output the TensorBoard log with dir
        summary_writer = tf.summary.FileWriter('/workspace/tictactoe/data', sess.graph)

        for step in range(10000):
            ind = np.random.choice(len(train_label), 1000)

            sess.run(
                train_step,
                feed_dict={squares_placeholder: train_square[ind], labels_placeholder: train_label[ind]}
            )

            if step % 100 == 0:
                train_loss = sess.run(
                    loss,
                    feed_dict={squares_placeholder: train_square, labels_placeholder: train_label}
                )

                train_accuracy, labels_pred = sess.run(
                    [accuracy, logits],
                    feed_dict={squares_placeholder: train_square, labels_placeholder: train_label}
                )

                test_accuracy = sess.run(
                    accuracy,
                    feed_dict={squares_placeholder: test_square, labels_placeholder: test_label}
                )

                summary = sess.run(
                    summary_step,
                    feed_dict={squares_placeholder: train_square, labels_placeholder: train_label}
                )

                summary_writer.add_summary(summary, step)

                print "Iteration: {0} Loss: {1} Train Accuracy: {2} Test Accuracy{3}".format(
                    step, train_loss, train_accuracy, test_accuracy
                )


        save_path = saver.save(sess, 'tictactoe.ckpt')

Learning execution

2017-07-04 14:19:57.084696: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-04 14:19:57.084722: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-07-04 14:19:57.084728: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-04 14:19:57.084733: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Iteration: 0 Loss: 1.08315229416 Train Accuracy: 0.425999999046 Test Accuracy0.426455020905
Iteration: 100 Loss: 0.748883843422 Train Accuracy: 0.814249992371 Test Accuracy0.785185158253
Iteration: 200 Loss: 0.629662871361 Train Accuracy: 0.934000015259 Test Accuracy0.894179880619
Iteration: 300 Loss: 0.600810408592 Train Accuracy: 0.960250020027 Test Accuracy0.914285719395
Iteration: 400 Loss: 0.594145357609 Train Accuracy: 0.964749991894 Test Accuracy0.913227498531
Iteration: 500 Loss: 0.582351207733 Train Accuracy: 0.975499987602 Test Accuracy0.925396800041
Iteration: 600 Loss: 0.575868725777 Train Accuracy: 0.981500029564 Test Accuracy0.920634925365
Iteration: 700 Loss: 0.571496605873 Train Accuracy: 0.98425000906 Test Accuracy0.924867749214
Iteration: 800 Loss: 0.571447372437 Train Accuracy: 0.98474997282 Test Accuracy0.919576704502
Iteration: 900 Loss: 0.567611455917 Train Accuracy: 0.98575001955 Test Accuracy0.925396800041
Iteration: 1000 Loss: 0.567007541656 Train Accuracy: 0.98575001955 Test Accuracy0.925396800041
Iteration: 1100 Loss: 0.566512107849 Train Accuracy: 0.986000001431 Test Accuracy0.928571403027
Iteration: 1200 Loss: 0.566121637821 Train Accuracy: 0.986000001431 Test Accuracy0.925925910473
Iteration: 1300 Loss: 0.565603733063 Train Accuracy: 0.986500024796 Test Accuracy0.924338638783
Iteration: 1400 Loss: 0.56520396471 Train Accuracy: 0.986750006676 Test Accuracy0.925396800041
Iteration: 1500 Loss: 0.564830541611 Train Accuracy: 0.986999988556 Test Accuracy0.926455020905
Iteration: 1600 Loss: 0.564735352993 Train Accuracy: 0.986999988556 Test Accuracy0.926455020905
Iteration: 1700 Loss: 0.564707398415 Train Accuracy: 0.986999988556 Test Accuracy0.926984131336
Iteration: 1800 Loss: 0.56460750103 Train Accuracy: 0.986999988556 Test Accuracy0.9280423522
Iteration: 1900 Loss: 0.564545154572 Train Accuracy: 0.986999988556 Test Accuracy0.926455020905
Iteration: 2000 Loss: 0.564533174038 Train Accuracy: 0.986999988556 Test Accuracy0.928571403027
Iteration: 2100 Loss: 0.564481317997 Train Accuracy: 0.986999988556 Test Accuracy0.927513241768
Iteration: 2200 Loss: 0.564553022385 Train Accuracy: 0.986999988556 Test Accuracy0.92962962389
Iteration: 2300 Loss: 0.583365738392 Train Accuracy: 0.96850001812 Test Accuracy0.919576704502
Iteration: 2400 Loss: 0.566257119179 Train Accuracy: 0.986249983311 Test Accuracy0.926455020905
Iteration: 2500 Loss: 0.563695311546 Train Accuracy: 0.987999975681 Test Accuracy0.926455020905
Iteration: 2600 Loss: 0.563434004784 Train Accuracy: 0.987999975681 Test Accuracy0.92962962389
Iteration: 2700 Loss: 0.563206732273 Train Accuracy: 0.988250017166 Test Accuracy0.9280423522
Iteration: 2800 Loss: 0.563172519207 Train Accuracy: 0.988250017166 Test Accuracy0.931746006012
Iteration: 2900 Loss: 0.563154757023 Train Accuracy: 0.988250017166 Test Accuracy0.931746006012
Iteration: 3000 Loss: 0.563151359558 Train Accuracy: 0.988250017166 Test Accuracy0.930687844753
Iteration: 3100 Loss: 0.563149094582 Train Accuracy: 0.988250017166 Test Accuracy0.930687844753
Iteration: 3200 Loss: 0.563141226768 Train Accuracy: 0.988250017166 Test Accuracy0.930687844753
Iteration: 3300 Loss: 0.563139140606 Train Accuracy: 0.988250017166 Test Accuracy0.931216955185
Iteration: 3400 Loss: 0.563138246536 Train Accuracy: 0.988250017166 Test Accuracy0.931216955185
Iteration: 3500 Loss: 0.563154280186 Train Accuracy: 0.988250017166 Test Accuracy0.931746006012
Iteration: 3600 Loss: 0.563149809837 Train Accuracy: 0.988250017166 Test Accuracy0.931216955185
Iteration: 3700 Loss: 0.563176214695 Train Accuracy: 0.988250017166 Test Accuracy0.929100513458
Iteration: 3800 Loss: 0.563181519508 Train Accuracy: 0.988250017166 Test Accuracy0.931216955185
Iteration: 3900 Loss: 0.563153684139 Train Accuracy: 0.988250017166 Test Accuracy0.929100513458
Iteration: 4000 Loss: 0.563127815723 Train Accuracy: 0.988250017166 Test Accuracy0.930687844753
Iteration: 4100 Loss: 0.563163101673 Train Accuracy: 0.988250017166 Test Accuracy0.9280423522
Iteration: 4200 Loss: 0.563137412071 Train Accuracy: 0.988250017166 Test Accuracy0.930687844753
Iteration: 4300 Loss: 0.563160598278 Train Accuracy: 0.988250017166 Test Accuracy0.930687844753
Iteration: 4400 Loss: 0.563147187233 Train Accuracy: 0.988250017166 Test Accuracy0.926984131336
Iteration: 4500 Loss: 0.563141047955 Train Accuracy: 0.988250017166 Test Accuracy0.9280423522
Iteration: 4600 Loss: 0.563162863255 Train Accuracy: 0.988250017166 Test Accuracy0.930158734322
Iteration: 4700 Loss: 0.563196718693 Train Accuracy: 0.988250017166 Test Accuracy0.929100513458
Iteration: 4800 Loss: 0.563158690929 Train Accuracy: 0.988250017166 Test Accuracy0.92962962389
Iteration: 4900 Loss: 0.563124537468 Train Accuracy: 0.988250017166 Test Accuracy0.926984131336
Iteration: 5000 Loss: 0.563167691231 Train Accuracy: 0.988250017166 Test Accuracy0.930158734322
Iteration: 5100 Loss: 0.563187777996 Train Accuracy: 0.988250017166 Test Accuracy0.930687844753
Iteration: 5200 Loss: 0.56315112114 Train Accuracy: 0.988250017166 Test Accuracy0.9280423522
Iteration: 5300 Loss: 0.570619702339 Train Accuracy: 0.981000006199 Test Accuracy0.924867749214
Iteration: 5400 Loss: 0.576466858387 Train Accuracy: 0.976249992847 Test Accuracy0.927513241768
Iteration: 5500 Loss: 0.563514411449 Train Accuracy: 0.988250017166 Test Accuracy0.928571403027
Iteration: 5600 Loss: 0.562488973141 Train Accuracy: 0.989000022411 Test Accuracy0.93439155817
Iteration: 5700 Loss: 0.56244301796 Train Accuracy: 0.989000022411 Test Accuracy0.933333337307
Iteration: 5800 Loss: 0.562356352806 Train Accuracy: 0.989000022411 Test Accuracy0.934920608997
Iteration: 5900 Loss: 0.562411308289 Train Accuracy: 0.989000022411 Test Accuracy0.935978829861
Iteration: 6000 Loss: 0.562405765057 Train Accuracy: 0.989000022411 Test Accuracy0.933862447739
Iteration: 6100 Loss: 0.562349438667 Train Accuracy: 0.989000022411 Test Accuracy0.935449719429
Iteration: 6200 Loss: 0.562380075455 Train Accuracy: 0.989000022411 Test Accuracy0.934920608997
Iteration: 6300 Loss: 0.562388420105 Train Accuracy: 0.989000022411 Test Accuracy0.935449719429
Iteration: 6400 Loss: 0.562395453453 Train Accuracy: 0.989000022411 Test Accuracy0.935449719429
Iteration: 6500 Loss: 0.562419772148 Train Accuracy: 0.989000022411 Test Accuracy0.933862447739
Iteration: 6600 Loss: 0.562360167503 Train Accuracy: 0.989000022411 Test Accuracy0.934920608997
Iteration: 6700 Loss: 0.562407493591 Train Accuracy: 0.989000022411 Test Accuracy0.93439155817
Iteration: 6800 Loss: 0.562382221222 Train Accuracy: 0.989000022411 Test Accuracy0.934920608997
Iteration: 6900 Loss: 0.562420666218 Train Accuracy: 0.989000022411 Test Accuracy0.932804226875
Iteration: 7000 Loss: 0.562407851219 Train Accuracy: 0.989000022411 Test Accuracy0.933862447739
Iteration: 7100 Loss: 0.562392890453 Train Accuracy: 0.989000022411 Test Accuracy0.932804226875
Iteration: 7200 Loss: 0.562432050705 Train Accuracy: 0.989000022411 Test Accuracy0.932804226875
Iteration: 7300 Loss: 0.562389314175 Train Accuracy: 0.989000022411 Test Accuracy0.934920608997
Iteration: 7400 Loss: 0.562418997288 Train Accuracy: 0.989000022411 Test Accuracy0.933333337307
Iteration: 7500 Loss: 0.562441766262 Train Accuracy: 0.989000022411 Test Accuracy0.935449719429
Iteration: 7600 Loss: 0.562380254269 Train Accuracy: 0.989000022411 Test Accuracy0.934920608997
Iteration: 7700 Loss: 0.562415003777 Train Accuracy: 0.989000022411 Test Accuracy0.935449719429
Iteration: 7800 Loss: 0.562358081341 Train Accuracy: 0.989000022411 Test Accuracy0.93439155817
Iteration: 7900 Loss: 0.562432765961 Train Accuracy: 0.989000022411 Test Accuracy0.935978829861
Iteration: 8000 Loss: 0.562436521053 Train Accuracy: 0.989000022411 Test Accuracy0.936507940292
Iteration: 8100 Loss: 0.562419176102 Train Accuracy: 0.989000022411 Test Accuracy0.93439155817
Iteration: 8200 Loss: 0.562465846539 Train Accuracy: 0.989000022411 Test Accuracy0.931746006012
Iteration: 8300 Loss: 0.562432646751 Train Accuracy: 0.989000022411 Test Accuracy0.934920608997
Iteration: 8400 Loss: 0.562426924706 Train Accuracy: 0.989000022411 Test Accuracy0.933333337307
Iteration: 8500 Loss: 0.562418758869 Train Accuracy: 0.989000022411 Test Accuracy0.933862447739
Iteration: 8600 Loss: 0.562417984009 Train Accuracy: 0.989000022411 Test Accuracy0.935978829861
Iteration: 8700 Loss: 0.562437176704 Train Accuracy: 0.989000022411 Test Accuracy0.935978829861
Iteration: 8800 Loss: 0.578755617142 Train Accuracy: 0.972500026226 Test Accuracy0.927513241768
Iteration: 8900 Loss: 0.565938591957 Train Accuracy: 0.986000001431 Test Accuracy0.935449719429
Iteration: 9000 Loss: 0.562196016312 Train Accuracy: 0.989250004292 Test Accuracy0.935978829861
Iteration: 9100 Loss: 0.561437726021 Train Accuracy: 0.990000009537 Test Accuracy0.938624322414
Iteration: 9200 Loss: 0.561364352703 Train Accuracy: 0.990000009537 Test Accuracy0.938624322414
Iteration: 9300 Loss: 0.561371803284 Train Accuracy: 0.990000009537 Test Accuracy0.938624322414
Iteration: 9400 Loss: 0.561358273029 Train Accuracy: 0.990000009537 Test Accuracy0.939682543278
Iteration: 9500 Loss: 0.561344504356 Train Accuracy: 0.990000009537 Test Accuracy0.939153432846
Iteration: 9600 Loss: 0.561368823051 Train Accuracy: 0.990000009537 Test Accuracy0.939682543278
Iteration: 9700 Loss: 0.56139343977 Train Accuracy: 0.990000009537 Test Accuracy0.937566161156
Iteration: 9800 Loss: 0.561371207237 Train Accuracy: 0.990000009537 Test Accuracy0.939682543278
Iteration: 9900 Loss: 0.561351060867 Train Accuracy: 0.990000009537 Test Accuracy0.939682543278

About the generated file

http://qiita.com/neriai/items/791e6f4dd8d08775542b

Learning results

Learning model

FireShot Capture 37 - TensorBoard - http___localhost_6006_#graphs.png

About Tensor Board

http://qiita.com/neriai/items/a7b47127462ecf0fcc1d

Learning graph

FireShot Capture 39 - TensorBoard - http___localhost_6006_#scalars.png

Summary

--Maybe it's the easiest model structure to understand. ――Is it because of overfitting that the graph has a subtle upper limit? ――It seems to be difficult to make learning data. ――Next time, I would like to actually play it.

All page links

-I made a ○ ✕ game using TensorFlow -I tried playing a game using TensorFlow

Recommended Posts

I tried to make a ○ ✕ game using TensorFlow
I tried playing a ○ ✕ game using TensorFlow
I tried to classify text using TensorFlow
I tried to make a Web API
I tried to make a simple text editor using PyQt
I tried to make a regular expression of "amount" using Python
I tried to make a regular expression of "time" using Python
I tried to make a regular expression of "date" using Python
I tried to make a todo application using bottle with python
I want to make a game with Python
I tried using magenta / TensorFlow
I tried to make a "fucking big literary converter"
I tried to draw a configuration diagram using Diagrams
Make a face recognizer using TensorFlow
I tried to make a translation BOT that works on Discord using googletrans
I tried to make a suspicious person MAP quickly using Geolonia address data
I tried to make a simple image recognition API with Fast API and Tensorflow
I tried hosting a TensorFlow deep learning model using TensorFlow Serving
I tried to automate [a certain task] using Raspberry Pi
I tried to make something like a chatbot with the Seq2Seq model of TensorFlow
I tried to create a linebot (implementation)
I tried using Azure Speech to Text.
I tried to create a linebot (preparation)
I tried to visualize AutoEncoder with TensorFlow
I tried using pipenv, so a memo
I tried to predict Covid-19 using Darts
[5th] I tried to make a certain authenticator-like tool with python
I tried to get a database of horse racing using Pandas
I tried to make a system that fetches only deleted tweets
[2nd] I tried to make a certain authenticator-like tool with python
[Python] I tried to implement stable sorting, so make a note
I tried to implement anomaly detection using a hidden Markov model
[3rd] I tried to make a certain authenticator-like tool with python
I tried to implement a misunderstood prisoner's dilemma game in Python
I tried to make a periodical process with Selenium and Python
I tried to get a list of AMI Names using Boto3
I tried to make a 2channel post notification application with Python
[4th] I tried to make a certain authenticator-like tool with python
[1st] I tried to make a certain authenticator-like tool with python
I tried to make a strange quote for Jojo with LSTM
I tried to make a mechanism of exclusive control with Go
[Git] I tried to make it easier to understand how to use git stash using a concrete example
I tried to make a document search slack command using Kendra announced at re: Invent 2019.
I tried to make a motion detection surveillance camera with OpenCV using a WEB camera with Raspberry Pi
Make a squash game
I tried using parameterized
I want to make matplotlib a dark theme
I tried to transform the face image using sparse_image_warp of TensorFlow Addons
I tried using argparse
I tried to build a super-resolution method / ESPCN
I tried using mimesis
I tried using anytree
〇✕ I made a game
I tried to build a super-resolution method / SRCNN ①
I tried to make a face diagnosis AI for a female professional golfer ①
I tried using aiomysql
I tried playing a typing game in Python
I tried to synthesize WAV files using Pydub.
I tried using coturn
I learned scraping using selenium to make a horse racing prediction model.
I tried using Pipenv