[PYTHON] Let's make Othello AI with Chainer-Part 2-

Introduction

This article is a sequel to "Creating Othello AI with Chainer-Part 1-". We hope you read the first part before reading this article.

So, using the trained model created in the first part, I implemented AI using the MLP model in the Othello game. For the Othello game app, use the one created in "Let's make Othello with wxPython". The source code and trained models of the Othello game app are posted on GitHub, so please get them from there.

Othello game app description

This is an app created with wxPython. It requires wxPython installation to work. Please see here for the installation method. The startup method is as follows.

$ python reversi.py

reversi_1.2.0.png I will briefly explain the screen. Specify the model for AI by MLP in "MLP model setting" on the far right. Specify the model for the first move (black) in "for black" and the model for the second move (white) in "for white". Similarly, specify the computer AI in "Computer AI setting" on the far right. Details of AI will be described later. A game record is displayed in the blank area in the center as the game progresses. Set the game type in "Game mode" at the bottom center and start the game with the "START" button.

"SCORE" at the bottom center is the current number of stones on the play (black) and play (white). Enter the number of Loops in the text box at the bottom center and press the "Comp vs Comp Loop" button to play Computer A and Computer B a specified number of times in a row. (At this time, do not use the "START" button) At the end of the Loop, the number of wins for Computer A and Computer B will be displayed. DRAW is the number of draws.

This Othello app not only allows you to play regular matches, but also has the following features.

  1. You can select Computer AI. (For more information about AI other than MLP, click here](http://qiita.com/kanlkan/items/cf902964b02179d73639))
    • MLP
  1. If AI using the MLP model is selected and the AI selects a move that deviates from the rules, "* Illegal move! ... *" is output to the standard error output. The types are as follows. ** If an Illegal move! Occurs, Computer's hand should be placed in the first place found in the stone, if it can be placed, and if a pass must be made, pass and continue the game. Masu (Fail Safe). ** **
    • Cannot put stone but AI cannot select 'PASS'.
  1. You can count the wins and losses by playing multiple battles between Computer AIs in a row.

Verify AI using the MLP model

Now let's move on to verifying whether it is playable. I want to make a strong AI if possible. Therefore, the model (* model_black_win.npz ) trained using only the game record of the first (black) winning game and the model ( model_white_win.npz *) trained using only the game record of the second (white) winning game. Let's play against a human (me) using *). Hmmmm. .. .. ** Mumu **. ** Illegal move! It came out ... ** It wasn't as easy as Tic-tac-toe. However, it is still within expectations. First, let's examine the types of Illegal moves that appear and their frequency. The sun goes down when you play against yourself, so let's play against each other. I will play against MLP and Random. Illegal move! Occurred 7027 times in 1000 battles. The breakdown of the types of Illegal move that occurred is as follows.

Detail Number of occurrences
Cannot put stone but AI cannot select 'PASS'. 93
Cannot 'PASS' this turn but AI selected it. 51
Cannot put stone at AI selected postion. 6883
total 7027

Roughly calculate. Assuming that no pass is generated in one game and the board cannot be settled until the board is filled, 60 moves will be taken until the board is filled and settled. One player hits 30 moves. 30,000 moves in 1000 battles. So, since it is 7027/30000, it can be estimated that 23.4% of the cases are fraudulent.

80% of the games are played according to the rules, so I'm worried about changing the MLP configuration ... First of all, I will do what I can think of without changing the configuration of MLP.

The try & error festival has started.

Retry 1: Try using the game records of all games, not just the game records of the winning games

To create a strong AI, I will set it aside and give priority to observing the rules. We will try to increase the variety of patterns by learning not only the games that won there but also the games that lost. Set "* model_black.npz " for the first move (black) and " model_white.npz *" for the second move (white) as the AI of MLP. These models are models created with the following commands.

$ python build_mlp.py Othello.01e4.ggf black
$ mv reversi_model.npz model_black.npz
$ python build_mlp.py Othello.01e4.ggf white
$ mv reversi_model.npz model_white.npz

Now, let's play against each other as before. MLP vs Random rematch. Illegal move! Occurred 5720 times in 1000 battles. The breakdown is as follows.

Detail Number of occurrences
Cannot put stone but AI cannot select 'PASS'. 65
Cannot 'PASS' this turn but AI selected it. 123
Cannot put stone at AI selected postion. 5532
total 5720

Since it is 5720/30000, 19.1% of them are doing something wrong. It seems that the future is still long, but it is less than before. It's a good trend.

Retry 2: Try increasing the training Epoch count from 1000 to 3000

It's pretty defeated, but the method is that if you train more often, it will work. I modified "* build_mlp.py *" so that batch_size and max epoch count can be specified as arguments, and recreated the model with batch_size = 100, max_epoch_count = 3000.

$ python build_mlp.py Othello.01e4.ggf black 100 3000
$ mv reversi_model.npz model_epoch-3000_black.npz
$ python build_mlp.py Othello.01e4.ggf white 100 3000
$ mv reversi_model.npz model_epoch-3000_white.npz

By the way, in my environment, this learning took 5 hours + 5 hours = 10 hours ... I'm afraid I only have a Linux environment on VirtualBox ...

Compete between Computers with these models. MLP vs Random 1000 battles.

Detail Number of occurrences
Cannot put stone but AI cannot select 'PASS'. 35
Cannot 'PASS' this turn but AI selected it. 257
Cannot put stone at AI selected postion. 5677
total 5966

Since it is 5966/30000, 19.9% of the time it is fraudulent. It doesn't change much. Even if you increase the Epoch to 1000 or more, it seems that learning will not proceed any further. Actually, even if you do not try with Epoch set to 3000, prepare about 1000 test data similar to the teacher data separately and take the Epoch-Accuracy correlation (learning curve) when executing "* build_mlp.py *". If you look at it, you can see how much Epoch is unlikely to advance learning any further.

Retry 3: Perform MLP vs Random 10000 times and use the model trained on the game record

Even if the teacher data is devised, it does not go well, so I will try a different approach. Set up MLP AI using "* model_black.npz " and " model_white.npz " and play against each other in MLP vs Random (10000 battles this time). I will try to train again with that game record. The intention is that by using a game record in which the correct answer is given by the Fail Safe function for the pattern that MLP AI is not good at, I think that it is possible to learn about the pattern that MLP AI is not good at. Because it is. First, let's play 10000 MLP vs Random. A game record file called " record.log " will be saved, so rename it to " mlp_vs_random_10000_01.log *". Read this log file and recreate the model.

$ python build_mlp.py mlp_vs_random_10000_01.log black 100 1000
$ mv reversi_model.npz model_mlp_vs_random_black_01.npz
$ python build_mlp.py mlp_vs_random_10000_01.log white 100 1000
$ mv reversi_model.npz model_mlp_vs_random_white_01.npz

Rematch with MLP vs Random using the new model (* model_mlp_vs_random_black_01.npz, model_mlp_vs_random_white_01.npz *). Illegal move! Occurred 2794 times in 1000 matches. The breakdown is as follows.

Detail Number of occurrences
Cannot put stone but AI cannot select 'PASS'. 84
Cannot 'PASS' this turn but AI selected it. 57
Cannot put stone at AI selected postion. 2653
total 2794

Since it is 2794/30000, it is illegal at a rate of 9.3%. Sounds good! It's about half! Now, let's do our best from here.

This method seems to work, so try the same steps again. Use * model_mlp_vs_random_black_01.npz, model_mlp_vs_random_white_01.npz * to play 10000 times in MLP vs Random, change "* record.log " to " mlp_vs_random_10000_02.log *" and recreate the model with this log. I will.

$ python build_mlp.py mlp_vs_random_10000_02.log black 100 1000
$ mv reversi_model.npz model_mlp_vs_random_black_02.npz
$ python build_mlp.py mlp_vs_random_10000_02.log white 100 1000
$ mv reversi_model.npz model_mlp_vs_random_white_02.npz

Rematch with MLP vs Random using the new model (* model_mlp_vs_random_black_02.npz, model_mlp_vs_random_white_02.npz *). Illegal move! Occurred 2561 times in 1000 battles. The breakdown is as follows.

Detail Number of occurrences
Cannot put stone but AI cannot select 'PASS'. 121
Cannot 'PASS' this turn but AI selected it. 41
Cannot put stone at AI selected postion. 2399
total 2561

Since it is 2399/30000, 8.0% of the cases are fraudulent. Hmmm ... should it be seen as a slight drop? Once, this method is stopped here.

Retry 4: Perform MLP vs Random 10000 times, match the game record with Othell0.01e4.ggf, and use the trained model.

$cat Othello.01e4 mlp_random_10000_01.log > 01e4_mlp_randomA_10000_01.ggf
$ python build_mlp.py 01e4_mlp_randomA_10000_01.ggf
 black 100 1000
$ mv reversi_model.npz model_01e4_mlp_vs_random_black_02.npz
$ python build_mlp.py 01e4_mlp_randomA_10000_01.ggf
 white 100 1000
$ mv reversi_model.npz model_01e4_mlp_vs_random_white_02.npz

Let's train with the existing game record "* Othello.01e4.ggf " and the game record created in the previous stage " mlp_random_10000_01.log *". I thought that the performance would be improved because the game record would be more diverse and include a hand to become an Illegal move.

Rematch with MLP vs Random using the new model (* model_01e4_mlp_vs_random_black.npz, model_01e4_mlp_vs_random_white.npz *). Illegal move! Occurred 3325 times in 1000 battles. The breakdown is as follows.

Detail Number of occurrences
Cannot put stone but AI cannot select 'PASS'. 48
Cannot 'PASS' this turn but AI selected it. 90
Cannot put stone at AI selected postion. 3187
total 3325

Since it is 3325/30000, we are doing fraudulent actions at a rate of 11.1%. Well, the result is not much different from Retry 3. Sorry.

Interim summary

This week I will summarize it here.

Is it such a place? At present, the probability of selecting an illegal move has been reduced to 8.0%, so AI with Fail Safe will be the result at the midpoint. Well, this may be the final result, but I'm a little disappointed, so from next week onward, I'll be playing around with the MLP model settings. I will add more to this article about future tries. If you are interested in the development, please watch it.

In this study, in my environment, it is a big bottleneck that it takes several hours to train the model from the game record every time ... If you can use GPU, I think that it can be done a little easier, so it would be interesting to rewrite "* build_mlp.py *" to support GPU and try various things.

** ↓↓↓↓↓ 2016/08/14 update ↓↓↓↓↓ **

Check the learning curve

Updated "* bulb_mlp.py *" to get a learning curve. The end of the data 1000 samples are reserved as test (Validation) data (this 1000 samples are not used for training), and the correct answer rate (main / accuracy) in the training data at each Epoch and the test data unknown to the model The correct answer rate (validation / main / accuracy) is displayed. Now you can draw a learning curve. Well, it's a story that you should finally do it from the beginning ...

First, the learning curve in the early model is shown. lc_black.png lc_white.png

At 1000 Epoch, main / accuracy is saturated at around 0.45. It can be judged that trying to change the Epoch performed in retry 2 from 1000 to 3000 seems to be ineffective. By the way, isn't the correct answer rate of 0.45 = 45% low? You may think that, because you are using an actual game record, there are multiple moves to answer a certain board condition, so it seems that the correct answer rate is not so high.

Next, the learning curve when the number of neurons in the hidden layer (h1, h2) is increased from 100 to 200 is shown. lc_neuron-200_black.png lc_neuron-200_white.png

It is the same that it almost converges at about 1000 Epoch. Increasing the number of neurons increases the correct answer rate (main / accuracy) of the training data, but the value of validation / main / accuracy, which is the correct answer rate for unknown input, is almost the same as when the number of neurons is 100. That means ... I don't think I can comply with the rules even if I change from 100 to 200 neurons ... But for the time being, I will check it as retry 5.

Retry 5: Increase h1 and h2 neurons from 100 to 200

I thought that increasing the number of neurons would increase the training effect on the game record and make it possible to comply with the rules. Change the definition of MLP Class in "* build_mlp.py *".

...
class MLP(Chain):
    def __init__(self):
        super(MLP, self).__init__(
                l1=L.Linear(64,200)
                l2=L.Linear(200,200)
                l3=L.Linear(200,65)
        )
...

After making the changes, create a trained model.

$ python build_mlp.py Othelo.01e4.ggf black 100 1000
$ mv reversi_model.npz model_neuron-200_black.npz
$ python build_mlp.py Othelo.01e4.ggf white 100 1000
$ mv reversi_model.npz model_neuron-200_white.npz

Rematch with MLP vs Random using the new model (* model_neuron-200_black.npz, model_neuron-200_white.npz ). (The definition of MLP class in " reversi.py *" also needs to be changed as described above) Illegal move! Occurred 10778 times in 1000 battles. The breakdown is as follows.

Detail Number of occurrences
Cannot put stone but AI cannot select 'PASS'. 99
Cannot 'PASS' this turn but AI selected it. 120
Cannot put stone at AI selected postion. 10778
total 10997

Since it is 10997/30000, we are doing fraudulent actions at a rate of 36.7%. It's getting worse ... it's probably too fit for your training data.

Retry 6: Increase training sample by about 3 times

Connect not only Othello.01e4.ggf but also Othello.02e4.ggf and Othello.03e4.ggf to read the game record, and try to learn more various patterns by increasing the amount by about 3 times.

$ cat Othello.01e4.ggf Othello.02e4.ggf Othello.03e4.ggf > Othello.01-03e4.ggf
$ python build_mlp.py Ohtello.01-03e4.ggf black 100 1000
$ mv reversi_model.npz model_01-03e4_black.npz 
$ python build_mlp.py Ohtello.01-03e4.ggf white 100 1000
$ mv reversi_model.npz model_01-03e4_white.npz

The learning curve is as follows. lc_01-03e4_black.png lc_01-03e4_white.png The validation / main / accuracy has improved from 0.3 to 0.35 compared to the case where the game record is only Othello.01e4.ggf. Can you expect a little?

Rematch with MLP vs Random using the new model (* model_01-03e4_black.npz, model_01-03e4_white.npz *). Illegal move! Occurred 5284 times in 1000 matches. The breakdown is as follows.

Detail Number of occurrences
Cannot put stone but AI cannot select 'PASS'. 40
Cannot 'PASS' this turn but AI selected it. 228
Cannot put stone at AI selected postion. 5016
total 5284

Since it is 5284/30000, it is illegal at a rate of 17.6%. It's not much different from the case where the game record uses only Othello.01e4.ggf (19.1%). It's getting pretty messy ...

Retry 7: Add "place" information to the MLP input (board condition)

This time, I will change my mind and try to modify the MLP input itself. Specifically, in addition to '0': none, '1': black, and '2': white, the board information given as input is added as a place where '3': can be placed. For example. X1 is the state X0 in the case of the black turn, with a place to put it added.

X0 = [[0,0,0,0,0,0,0,0],\
      [0,0,0,0,0,0,0,0],\
      [0,0,0,0,0,0,0,0],\
      [0,0,0,2,1,0,0,0],\
      [0,0,1,2,2,2,2,0],\
      [0,0,0,1,2,2,0,0],\
      [0,0,0,0,1,2,0,0],\
      [0,0,0,0,0,0,0,0]]

X1 = [[0,0,0,0,0,0,0,0],\
      [0,0,0,0,0,0,0,0],\
      [0,0,0,3,3,0,0,0],\
      [0,0,3,2,1,3,0,3],\
      [0,0,1,2,2,2,2,3],\
      [0,0,3,1,2,2,3,0],\
      [0,0,0,0,1,2,3,0],\
      [0,0,0,0,0,0,0,0]]

By doing this, I thought that AI that complies with the rules could be created. There seems to be a tsukkomi saying "I'm almost telling you the rules!", But the premise that the state of the board is used as input and the next move is obtained as output has not changed. (A pretty painful excuse ...) Changed "* build_mlp.py *" to specify whether to add a place to put it. If True is added at the end of the command, the place where it can be placed will be added to the input board as '3'.

$ pythohn build_mlp.py Othello.01e4.ggf black 100 1000 True
$ mv reversi_model.npz model_black_puttable_mark.npz 
$ pythohn build_mlp.py Othello.01e4.ggf white 100 1000 True
$ mv reversi_model.npz model_white_puttable_mark.npz 

The learning curve is as follows. lc_puttable_mark_black.png lc_puttable_mark_white.png

It has the highest validation / main / accuracy so far and converges at around 0.4.

Rematch with MLP vs Random using the new model (* model_black_puttable_mark.npz, model_white_puttable_mark.npz ). Fixed " reversi.py " to support these models. If the specified MLP model name includes " puttable_mark *", it corresponds to the input represented by "3" where it can be placed.

Illegal move! Occurred 207 times in 1000 battles. The breakdown is as follows.

Detail Number of occurrences
Cannot put stone but AI cannot select 'PASS'. 16
Cannot 'PASS' this turn but AI selected it. 17
Cannot put stone at AI selected postion. 174
total 207

Since it is 207/30000, we are doing something illegal at a rate of 0.7%. What a 1% or less! It is a dramatic improvement.

Final summary

I will summarize the above.

As an AI that works properly, is it the one with "3" added to the state of the board with "Fail Safe" added? It may be a little sloppy, but I was able to make a decent AI.

At present, the strength of AI depends only on the game record. After creating the basic AI by the above method, it will be strengthened by using reinforcement learning etc. I'm totally ignorant about reinforcement learning, so I'll try after studying.

Thank you for your consideration. It was poor.

Recommended Posts

Let's make Othello AI with Chainer-Part 1-
Let's make Othello AI with Chainer-Part 2-
Let's make a tic-tac-toe AI with Pylearn 2
Make Puyo Puyo AI with Python
Let's make Splatoon AI! part.1
Let's make dice with tkinter
Let's make a GUI with python.
Let's make a breakout with wxPython
Let's make a graph with python! !!
Let's make a supercomputer with xCAT
Let's make a shiritori game with Python
Let's make a voice slowly with Python
Let's make a simple language with PLY 1
[Python] Let's make matplotlib compatible with Japanese
Let's make a web framework with Python! (1)
Let's make a Twitter Bot with Python!
Let's make a web framework with Python! (2)
Let's replace UWSC with Python (5) Let's make a Robot
Make Lambda Layers with Lambda
3. 3. AI programming with Python
Make Yubaba with Discord.py
Play with Othello (Reversi)
[Let's play with Python] Make a household account book
Let's make a simple game with Python 3 and iPhone
Let's make dependency management with pip a little easier
Let's make a Mac app with Tkinter and py2app
Let's make a spherical grid with Rhinoceros / Grasshopper / GHPython
[Super easy] Let's make a LINE BOT with Python.
Make slides with iPython
I tried to make Othello AI that I learned 7.2 million hands by deep learning with Chainer
Let's make a websocket client with Python. (Access token authentication)
A story about competing with a friend in Othello AI Preparation
Make HTTPS for free with Amazon Linux2 + Freenom + Let's Encrypt
Let's make a diagram that can be clicked with IPython
Let's play with 4D 4th
Let's make a Discord Bot.
Let's play with Amedas data-Part 1
Make sci-fi-like buttons with Kivy
Let's run Excel with Python
Let's make an Errbot plugin
Let's play with Amedas data-Part 4
Easy to make with syntax
Make a fortune with Python
Let's write python with cinema4d.
Let's do R-CNN with Sklearn-theano
Let's play with Amedas data-Part 2
Let's make a rock-paper-scissors game
Let's build git-cat with Python
Othello made with python (GUI-like)
Make a fire with kdeplot
Make Slack chatbot with Errbot
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Implementation ~
Let's make a WEB application for phone book with flask Part 1
Let's create a tic-tac-toe AI with Pylearn 2-Save and load models-
Let's make a cycle computer with Raspberry Pi Zero (W, WH)
Let's make a WEB application for phone book with flask Part 2
Let's make a WEB application for phone book with flask Part 3
Let's make a WEB application for phone book with flask Part 4
Let's make a web chat using WebSocket with AWS serverless (Python)!
Let's make an IoT shirt with Lambda, Kinesis, Raspberry Pi [Part 1]
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Battle Edition ~