It's been a while ago, but it was amazing that AlphaGo won the World Champion three times in a row. Inspired by that, I made an AI for Othello, the same board game this time.

The reason why I chose Othello is that I don't have enough computer resources! !! Because. In the development of AlphaGo, 50 GPUs are doing a tremendous 3 weeks, but it is absolutely impossible for such an individual. So I chose Othello, which has a small board and simple rules.

I mentioned AlphaGo, but the technology used is different from AlphaGo. AlphaGo combines supervised deep learning, deep reinforcement learning, and Monte Carlo tree search, but this time we are only using supervised deep learning. It doesn't perform well, but on the contrary, the algorithm is so simple that even those who don't know the latter two I just heard (I understand deep reinforcement learning, but not Monte Carlo tree search), deep learning. You can understand it only with the knowledge of! In the following explanation, it is assumed that there is some understanding of deep learning (I have read deep learning made from scratch).

Overview

The data of French Othello Association site was used as the musical score data of Othello. Since I used a mysterious format called WTHOR, it was quite difficult to process the data.

The neural network used the musical score state (0: none, 1: own stone, 2: opponent's stone) for the input data, and the probability of each move for the output data. For the teacher data, I used the position of the hand I struck. I think it's difficult to understand with just words, so I'll show you a diagram.

説明1.png

Neural network structure

Like AlphaGo, I used only the convolution layer ** and not the fully connected layer. This is because the output is two-dimensional data, so it is better to leave it as CNN. ** Softmax ** was used for the output layer, and Conv-> BN-> ReLU ** for the other layers. It is recommended to introduce Batch Normalization (BN) because learning will be stable.

Neural network learning

For the loss function, we used the orthodox cross entropy error. I used ** Adam ** as the optimization algorithm. Learning is faster and the final result is better than SGD.

Source code

The long-awaited source code is on github. See the source code for detailed parameters.

Learning results

The performance of AI is ... The rules seem to be mostly understood, but unfortunately they didn't get much stronger. The cause was probably supervised and deep learning only.

However, due to time constraints, I have only studied for about two and a half hours on a PC with only a CPU, so it may still be stronger.

So, ** Looking for someone who has a GPU! ** **

It may become stronger if you study with GPU for about a day.

[PYTHON] I tried to make Othello AI that I learned 7.2 million hands by deep learning with Chainer

Overview

Neural network structure

Neural network learning

Source code

Learning results