[PYTHON] Let's make Othello AI with Chainer-Part 1-

Introduction

Previously, I wrote an article "Let's make a tic-tac-toe AI with Pylearn 2". At this time, I made the problem simple and tic-tac-toe, but this time I will challenge the original goal of creating Othello's AI with a neural network. Please forgive it though it may be an article with many trials and errors. This time, I will use Chainer as a library for deep learning.

The article consists of two parts.

We will implement AI in the second part, but if it does not work, we will have to reconsider from the configuration of the neural network ... ** → Operation confirmed and conclusion **

Preparation of teacher data

I borrowed the game record data from here. A large amount of battle data stored in ggf format. It is readable in text format.

MLP design

First, decide the input and output. Since Othello's AI will be created, the input will be the state of the board and the next move will be the output. It is the following image when written in the figure. reversi-nn.png

Specifically, I will drop it into a neural network. The input is the state of the board, that is, an 8 x 8 vector. The elements of the vector are '0': none, '1': black, and '2': white. There are 64 types of output because it is classified according to where on the board the next move is taken, but there are cases where it can be passed, so there are 65 ways in total. The board position is indicated by 0 to 63, and Pass is 64. board_num.png

The rest is the design of the hidden layer. For the time being, the number of neurons in both layers is 100. How can I theoretically design here ... The figure is as follows.

reversi-nn_02.png

Implementation

Here is a python program that does the following:

build_mlp.py


#!/usr/bin/env python
# coding=utf-8
# 
# 0 : none
# 1 : black
# 2 : white
# 

import sys
import numpy as np
import chainer
from chainer import cuda, Function, gradient_check, report, training, utils, Variable
from chainer import datasets, iterators, optimizers, serializers
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L
from chainer.training import extensions

gVec = [(-1,-1),(-1,0),(-1,1),(0,-1),(0,1),(1,-1),(1,0),(1,1)]
gCol = ('A','B','C','D','E','F','G','H')
gRow = ('1','2','3','4','5','6','7','8')

# (1)
class MLP(Chain):
    def __init__(self):
        super(MLP, self).__init__(
                l1=L.Linear(64, 100),
                l2=L.Linear(100, 100),
                l3=L.Linear(100, 65),
        )

    def __call__(self, x):
        h1 = F.relu(self.l1(x))
        h2 = F.relu(self.l2(h1))
        y = self.l3(h2)
        return y

class Classifier(Chain):
    def __init__(self, predictor):
        super(Classifier, self).__init__(predictor=predictor)

    def __call__(self, x, t):
        y = self.predictor(x)
        loss = F.softmax_cross_entropy(y, t)
        accuracy = F.accuracy(y, t)
        report({'loss': loss, 'accuracy': accuracy}, self)
        return loss


def print_board(board):
    for i in range(8):
        print board[i]

    print ""

def update_board(board, pos_str, clr):
    assert clr!=0, "stone color is not black or white."
    updated_board = [[0 for col in range(8)] for row in range(8)]
    rev_list = []
    pos = pos_str2pos_index(pos_str)
    for v in gVec:
        temp_list = []
        for i in range(1, 8):
            # out of board
            if pos[0]+v[0]*(i+1) > 7 or pos[1]+v[1]*(i+1) > 7 or\
               pos[0]+v[0]*(i+1) < 0 or pos[1]+v[1]*(i+1) < 0:
                continue

            if board[pos[0]+v[0]*i][pos[1]+v[1]*i] == (clr % 2 + 1):
                temp_list.append([pos[0]+v[0]*i, pos[1]+v[1]*i])

                if board[pos[0]+v[0]*(i+1)][pos[1]+v[1]*(i+1)] == clr:
                    for j in temp_list:
                        rev_list.append(j)

                    break
            else:
                break

    rev_list.append(pos)  # put stone at pos
    assert board[pos[0]][pos[1]] == 0, "put position is not empty."
    print "rev_list = " + str(rev_list)
    for i in range(0, 8):
        for j in range(0, 8):
            if [i, j] in rev_list:
                updated_board[i][j] = clr
            else:
                updated_board[i][j] = board[i][j]

    return updated_board

def who_is_winner(board):
    # ret : 0  draw
    #       1  black win
    #       2  white win
    ret = 0
    score_b = 0
    score_w = 0
    for i in range(0, 8):
        for j in range(0, 8):
            if board[i][j] == 1:
                score_b += 1
            elif board[i][j] == 2:
                score_w += 1

    if score_b > score_w:
        ret = 1
    elif score_b < score_w:
        ret = 2

    print "Black vs White : " + str(score_b) + " vs " + str(score_w)
    return ret

def pos_str2pos_index(pos_str):
    pos_index = []
    for i, c in enumerate(gRow):
        if pos_str[1] == c:
            pos_index.append(i)
    
    for i, c in enumerate(gCol):
        if pos_str[0] == c:
            pos_index.append(i)


    return pos_index

def pos_str2pos_index_flat(pos_str):
    pos_index = pos_str2pos_index(pos_str)
    index = pos_index[0] * 8 + pos_index[1]
    return index


#==== Main ====#
record_X = []    # MLP input (board list)
record_y = []    # MLP output(class(0-64) list)
temp_X = []
temp_y = []
temp2_X = []
temp2_y = []
board = []
row = []

argv = sys.argv
argc = len(argv)

if argc != 3:
    print 'Usage'
    print '    python ' + str(argv[0]) + ' <record_filename> <type>'
    print '        type : black'
    print '               black_win'
    print '               white'
    print '               white_win'
    quit()

# check type
build_type = ''
for t in ['black', 'black_win', 'white', 'white_win']:
    if argv[2] == t:
        build_type = t

if build_type == '':
    print 'record type is illegal.'
    quit()

#(2)-- load record --#
f = open(argv[1], "r")
line_cnt = 1
for line in f:
    print 'Line Count = ' + str(line_cnt)
    idx = line.find("BO[8")
    if idx == -1:
        continue

    idx += 5
    # make board initial state
    for i in range(idx, idx+9*8):
        if line[i] == '-':
            row.append(0)
        elif line[i] == 'O':
            row.append(2)
        elif line[i] == '*':
            row.append(1)

        if (i-idx)%9 == 8:
            board.append(row)
            row = []
            if len(board) == 8:
                break

    row = []
    print_board(board)
    # record progress of game
    i = idx+9*8+2
    while line[i] != ';':
        if (line[i] == 'B' or line[i] == 'W') and line[i+1] == '[':
            temp_X.append(board)
            pos_str = line[i+2] + line[i+3]
            if pos_str == "pa":    # pass
                temp_y.append(64)
                # board state is not change
                print_board(board)
            else:
                if line[i] == 'B':
                    clr = 1
                elif line[i] == 'W':
                    clr = 2
                else:
                    clr = 0
                    assert False, "Stone Color is illegal."

                pos_index_flat = pos_str2pos_index_flat(pos_str)
                temp_y.append(pos_index_flat)
                board = update_board(board, pos_str, clr)

            if (line[i] == 'B' and (build_type == 'black' or build_type == 'black_win')) or \
               (line[i] == 'W' and (build_type == 'white' or build_type == 'white_win')):
                temp2_X.append(temp_X[0])
                temp2_y.append(temp_y[0])
                print 'X = '
                print_board(temp_X[0])
                print 'y = ' + str(temp_y[0]) + ' (' + \
                               str(pos_str2pos_index(pos_str)) + ') ' + \
                               '(' + pos_str + ')'
                print ''

            temp_X = []
            temp_y = []
        
        i += 1

    print "End of game"
    print_board(board)

    winner = who_is_winner(board)
    if (winner == 1 and build_type == 'black_win') or \
       (winner == 2 and build_type == 'white_win') or \
       build_type == 'black' or build_type == 'white':
        record_X.extend(temp2_X)
        record_y.extend(temp2_y)

    board = []
    temp2_X = []
    temp2_y = []
    line_cnt += 1


#(3)-- MLP model and Training --#
X = np.array(record_X, dtype=np.float32)
y = np.array(record_y, dtype=np.int32)

train = datasets.TupleDataset(X, y)
train_iter = iterators.SerialIterator(train, batch_size=100)

model = Classifier(MLP())
optimizer = optimizers.SGD()
optimizer.setup(model)

updater = training.StandardUpdater(train_iter, optimizer)
trainer = training.Trainer(updater, (1000, 'epoch'), out='result')

trainer.extend(extensions.ProgressBar())
trainer.run()

#(4)-- save model --#
serializers.save_npz('reversi_model.npz', model)


#(5)-- prediction example --#
X1_ = [[[0,0,0,0,0,0,0,0],\
        [0,0,0,0,0,0,0,0],\
        [0,0,0,0,0,0,0,0],\
        [0,0,0,2,1,0,0,0],\
        [0,0,0,1,2,0,0,0],\
        [0,0,0,0,0,0,0,0],\
        [0,0,0,0,0,0,0,0],\
        [0,0,0,0,0,0,0,0]]]

X1 = np.array(X1_, dtype=np.float32)
y1 = F.softmax(model.predictor(X1))
print "X1 = "
print_board(X1[0])
print "y1 = " + str(y1.data.argmax(1)) + '\n' 

X2_ = [[[0,0,0,0,0,0,0,0],\
        [0,0,0,0,0,0,0,0],\
        [0,0,2,2,2,0,0,0],\
        [0,0,2,1,1,1,0,0],\
        [0,2,2,2,1,1,0,0],\
        [0,0,2,1,0,0,0,0],\
        [0,0,0,0,0,0,0,0],\
        [0,0,0,0,0,0,0,0]]]

X2 = np.array(X2_, dtype=np.float32)
y2 = F.softmax(model.predictor(X2))
print "X2 = "
print_board(X2[0])
print "y2 = " + str(y2.data.argmax(1)) + '\n'

I will explain briefly. Please read it in correspondence with (1) to (5) in the code. (1) Chainer MLP definition The MLP class defines the structure of the MLP. As mentioned above, it is defined as input = 8x8 = 64, output = 65, and 2 hidden layers (100 neurons). The activation function and layer structure are defined in \ __ call \ __. The Classifier class defines the classification by Softmax.

(2) Reading and converting game records It's a long loop, but it reads the game record and converts it to input (8x8) and output (0-64). What was hard about programming was the hardest here ...

(3) Training of MLP model Input the converted game record into the model for training. Both input and output can be set in Numpy array format with dataset.TupleDataset. At this time, pay attention to the type. The rest is like the source code

to hold. Training is executed with trainer.run.

(4) Saving the model Save the trained model. NPZ format. It will be used when implementing it in the Othello game in the second part.

(5) Simple test Try to derive the next move correctly with the trained model.

An example of the execution result is as follows. We use'Othello.01e4.ggf' as teacher data, and create AI only with the winning data of the first move (black).

$ python build_mlp.py Othello.01e4.ggf black_win
...

X1 = 
[ 0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  0.  2.  1.  0.  0.  0.]
[ 0.  0.  0.  1.  2.  0.  0.  0.]
[ 0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  0.  0.  0.  0.  0.  0.]

y1 = [37]

X2 = 
[ 0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  2.  2.  2.  0.  0.  0.]
[ 0.  0.  2.  1.  1.  1.  0.  0.]
[ 0.  2.  2.  2.  1.  1.  0.  0.]
[ 0.  0.  2.  1.  0.  0.  0.  0.]
[ 0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  0.  0.  0.  0.  0.  0.]

y2 = [25]

I write not to use GPU for training. If you run it on a poor PC, it will take about an hour. Since the Progress Bar is displayed by the function of Chainer, please refer to whether it will be completed in a decent time if you do it in your own environment. Also, in the example, Othello.01e4.ggf is used, but since there were two data (lines 85 and 2699) that deviated from the expected format, I deleted them and executed it.

If you look at the prediction example displayed in the results, you can see that the AI hits: prediction-example.png

There are only two ways, but it seems that a simple test has successfully derived the next move. In Part 2, we will actually implement this trained model as AI in the Othello game and check the operation of Othello. ** → Operation confirmed and conclusion was reached. **

reference

Recommended Posts

Let's make Othello AI with Chainer-Part 1-
Let's make Othello AI with Chainer-Part 2-
Let's make Othello with wxPython
Let's make a tic-tac-toe AI with Pylearn 2
Make Puyo Puyo AI with Python
Let's make Splatoon AI! part.1
Let's make dice with tkinter
Let's make a GUI with python.
Let's make a breakout with wxPython
Let's make a graph with python! !!
Let's make a supercomputer with xCAT
Let's make a shiritori game with Python
Let's make a voice slowly with Python
Let's make a simple language with PLY 1
[Python] Let's make matplotlib compatible with Japanese
Let's make a web framework with Python! (1)
Let's make a Twitter Bot with Python!
Let's make a web framework with Python! (2)
Let's replace UWSC with Python (5) Let's make a Robot
Make Lambda Layers with Lambda
Make Yubaba with Discord.py
[Let's play with Python] Make a household account book
Let's make a simple game with Python 3 and iPhone
Visualize claims with AI
Let's make dependency management with pip a little easier
Let's make a Mac app with Tkinter and py2app
Let's make a spherical grid with Rhinoceros / Grasshopper / GHPython
[Super easy] Let's make a LINE BOT with Python.
Make slides with iPython
I tried to make Othello AI that I learned 7.2 million hands by deep learning with Chainer
A story about competing with a friend in Othello AI Preparation
Make HTTPS for free with Amazon Linux2 + Freenom + Let's Encrypt
Let's make a diagram that can be clicked with IPython
Let's play with 4D 4th
Let's make a Discord Bot.
Let's play with Amedas data-Part 1
Make sci-fi-like buttons with Kivy
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Introduction ~
Let's make an Errbot plugin
Let's play with Amedas data-Part 4
Easy to make with syntax
Make a fortune with Python
Let's write python with cinema4d.
Let's do R-CNN with Sklearn-theano
Let's play with Amedas data-Part 3
Let's play with Amedas data-Part 2
Let's make a rock-paper-scissors game
Let's build git-cat with Python
Othello made with python (GUI-like)
Make a fire with kdeplot
Make Slack chatbot with Errbot
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Implementation ~
Let's make a WEB application for phone book with flask Part 1
Let's create a tic-tac-toe AI with Pylearn 2-Save and load models-
Let's make a cycle computer with Raspberry Pi Zero (W, WH)
Let's make a WEB application for phone book with flask Part 2
Let's make a WEB application for phone book with flask Part 3
Let's make a WEB application for phone book with flask Part 4
Let's make a web chat using WebSocket with AWS serverless (Python)!
Let's make an IoT shirt with Lambda, Kinesis, Raspberry Pi [Part 1]
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Battle Edition ~