[PYTHON] Othello-From the tic-tac-toe of "Implementation Deep Learning" (3)

This is a continuation of this article. Othello-From the tic-tac-toe of "Implementation Deep Learning" (1) http://qiita.com/Kumapapa2012/items/cb89d73782ddda618c99 Othello-From the tic-tac-toe of "Implementation Deep Learning" (2) http://qiita.com/Kumapapa2012/items/f6c654d7c789a074c69b

Click here for subsequent articles. Othello-From the third line of "Implementation Deep Learning" (4) [End] http://qiita.com/Kumapapa2012/items/9cec4e6d2c935d11f108

I mentioned the activation function in the first article, but given the potential for dying ReLU, let's play an Othello game using perhaps the easiest and fastest way to get around this, Leaky ReLU. I tried it. The code is here. https://github.com/Kumapapa2012/Learning-Machine-Learning/tree/master/Reversi

Leaky ReLU ReLU is an activation function that sets all values less than 0 to 0.

f = \max(0,x)

This NN is a full join, but as described previously, it can cause a problem called dying ReLU. One solution to this is Leaky ReLU, which gives a small slope to negative values (default 0.2 for chainer).

{f = \begin{cases}
    x & (x>0) 
    \\
    0.2x  & (x<=0)
  \end{cases}
}

This eliminates the zero slope. This is a personal interpretation, but dying ReLU is basically due to the fact that the slope of the negative value is 0, so you can add a slope. However, because we want to keep the features of ReLU that "positive slope is 1 and negative slope is 0", which makes differentiation easy and calculation and learning (backpropagation) fast, the slope is small. , I think it is a feature of Leaky ReLU.

I tried using Leaky ReLU.

Change the activation function from ReLU to Leaky ReLU by changing only 8 lines of code in agent.py.

$ diff ~/git/Learning-Machine-Learning/Reversi/agent.py agent.py 
47,55c47,55
<         h = F.relu(self.l1(x))
<         h = F.relu(self.l20(h))
<         h = F.relu(self.l21(h))
<         h = F.relu(self.l22(h))
<         h = F.relu(self.l23(h))
<         h = F.relu(self.l24(h))
<         h = F.relu(self.l25(h))
<         h = F.relu(self.l26(h))
<         h = F.relu(self.l27(h))
---
>         h = F.leaky_relu(self.l1(x))   #slope=0.2(default)
>         h = F.leaky_relu(self.l20(h))
>         h = F.leaky_relu(self.l21(h))
>         h = F.leaky_relu(self.l22(h))
>         h = F.leaky_relu(self.l23(h))
>         h = F.leaky_relu(self.l24(h))
>         h = F.leaky_relu(self.l25(h))
>         h = F.leaky_relu(self.l26(h))
>         h = F.leaky_relu(self.l27(h))

As a result, the winning percentage has increased steadily on the 6x6 board. ** When using Leaky ReLU (slope = 0.2) ** image

It is quite different from the previous result. After all, was dying ReLU occurring? ** When using ReLU ** image

Next, in the case of 8x8 board ... The winning percentage was not stable / (^ o ^)
** When using Leaky ReLU (slope = 0.2) ** image

In the first result, the winning percentage seems to be converging in the end. ** When using ReLU ** image

If you think very simply, if the winning percentage is ReLU, that is, Leaky ReLU with Slope = 0, it seems to converge, and if it does not converge when Leaky ReLU with Slope = 0.2, there may be an optimum value in the meantime. Maybe. I would like to try it later with Slope = 0.1. But the bigger problem is that there is a wave of winning percentages. Rippling seems to mean that learning doesn't stop at the right place. This seems to be related to the learning rate. According to Chapter 6 of the book "Deep Learning from Zero", the learning rate is basically a coefficient that indicates the degree of update of the weight W. The higher the coefficient, the greater the degree of update of W, and the faster the learning progresses, but it diverges. There is a possibility [^ 1]. However, if it is too small, learning will be too slow. That is. The argument lr (Learning Rate = learning rate) of RMSPropGraves used this time is 0.00025. In RMSPropGraves of chainer, the default lr is 0.0001, so this sample is a little larger. Probably this 0.00025 is a value optimized for the learning speed of the sample tic-tac-toe, and in the case of the 8x8 board of Othello this time, the value of W is not stable, and as a result, the winning rate is as shown in the above graph. It is thought that it has become unstable. For this reason, I would like to try setting a low learning rate in the future. [^ 2]

References

Computer Othello https://ja.m.wikipedia.org/wiki/%E3%82%B3%E3%83%B3%E3%83%94%E3%83%A5%E3%83%BC%E3%82%BF%E3%82%AA%E3%82%BB%E3%83%AD
Talk about failure experiences and anti-patterns on Neural Networks http://nonbiri-tereka.hatenablog.com/entry/2016/03/10/073633 (Others will be added at a later date)

[^ 1]: The expansion of weight fluctuations due to a large learning rate can also be a factor in causing dying ReLU. [^ 2]: In addition, is the activation function of the output layer the same as the hidden layer in the first place? Should I think about it separately? I'm also worried about that.

Recommended Posts

Othello-From the tic-tac-toe of "Implementation Deep Learning" (3)
Othello-From the tic-tac-toe of "Implementation Deep Learning" (2)
Othello ~ From the tic-tac-toe of "Implementation Deep Learning" (4) [End]
Deep reinforcement learning 2 Implementation of reinforcement learning
Visualize the effects of deep learning / regularization
Deep learning 1 Practice of deep learning
The story of doing deep learning with TPU
[Learning memo] Deep Learning from scratch ~ Implementation of Dropout ~
Implementation of Deep Learning model for image recognition
Deep learning learned by implementation (segmentation) ~ Implementation of SegNet ~
Count the number of parameters in the deep learning model
Deep running 2 Tuning of deep learning
About testing in the implementation of machine learning models
Deep Learning from scratch The theory and implementation of deep learning learned with Python Chapter 3
Techniques for understanding the basis of deep learning decisions
Build a python environment to learn the theory and implementation of deep learning
[Deep Learning from scratch] Implementation of Momentum method and AdaGrad method
Deep learning learned by implementation 1 (regression)
Deep learning image recognition 2 model implementation
Chapter 2 Implementation of Perceptron Cut out only the good points of deep learning made from scratch
Deep Learning
[Anomaly detection] Try using the latest method of deep distance learning
Summary of pages useful for studying the deep learning framework Chainer
Deep learning learned by implementation 2 (image classification)
Learning notes from the beginning of Python 1
Meaning of deep learning models and parameters
Qiskit: Implementation of Quantum Circuit Learning (QCL)
Implementation of 3-layer neural network (no learning)
Try deep learning of genomics with Kipoi
Machine learning algorithm (implementation of multi-class classification)
Sentiment analysis of tweets with deep learning
[Reinforcement learning] Easy high-speed implementation of Ape-X!
I read the implementation of golang channel
Read the implementation of ARM global timer
Learning record of reading "Deep Learning from scratch"
Learning notes from the beginning of Python 2
Graph of the history of the number of layers of deep learning and the change in accuracy
I tried using the trained model VGG16 of the deep learning library Keras
I tried the common story of using Deep Learning to predict the Nikkei 225
Python vs Ruby "Deep Learning from scratch" Chapter 4 Implementation of loss function
I tried the common story of predicting the Nikkei 225 using deep learning (backtest)
Python vs Ruby "Deep Learning from scratch" Chapter 3 Implementation of 3-layer neural network
The story of low learning costs for Python
I read the implementation of range (Objects / rangeobject.c)
See the behavior of drunkenness with reinforcement learning
Deep learning / error back propagation of sigmoid function
Why the Python implementation of ISUCON 5 used Bottle
A memorandum of studying and implementing deep learning
Deep learning learned by implementation ~ Anomaly detection (unsupervised learning) ~
Deep Learning Memorandum
Start Deep learning
Basic understanding of stereo depth estimation (Deep Learning)
Python Deep Learning
Parallel learning of deep learning by Keras and Kubernetes
About the development contents of machine learning (Example)
Python: Deep learning in natural language processing: Implementation of answer sentence selection system
Deep learning × Python
Implementation of a model that predicts the exchange rate (dollar-yen rate) by machine learning
Deep learning dramatically makes it easier to see the time-lapse of physical changes
First deep learning in C #-Imitating implementation in Python-
How to install the deep learning framework Tensorflow 1.0 in the Anaconda environment of Windows