[PYTHON] Learn Zundokokiyoshi using a simple RNN

There is already a person who implements Zundokokiyoshi using LSTM, but I think that you should be able to learn with just a simple RNN at this level. , I tried to implement it with Chainer to understand RNN.

According to 7.5 of the book "Deep Learning (Machine Learning Professional Series)" (ISBN-13: 978-4061529021), RNN remembers You can do it for the past 10 hours. You only have to remember the pattern in which "Zun" appears 4 times and "Doko" appears once, so it should be within this range.

Way of thinking

I thought of a structure as simple as possible. Enter two ($ x_1, x_2 ). It has one intermediate layer and the number of units is 10. There are also two outputs ( y_1, y_2 $). The input is defined as follows.

Dung → x_0 = 0, x_1 = 1 \\
Doco → x_0 = 1, x_1 = 0

The output is defined as follows: Let's say it's a simple classification problem.

Kiyoshi failed → y_0 = 1, y_1 = 0 \\
Kiyoshi established → y_0 = 0, y_1 = 1

Model definition

The code definition is as follows.

import numpy as np
import chainer
from chainer import cuda, Function, gradient_check, Variable, optimizers, serializers, utils
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L

class RNN(chainer.Chain):
    def __init__(self):
        super(RNN, self).__init__(
            w1 = L.Linear(2, 10),
            h1 = L.Linear(10, 10),
            o = L.Linear(10, 2)
            )
    def reset_state(self):
        self.last_z = chainer.Variable(np.zeros((1,10), dtype=np.float32))
    def __call__(self, x):
        z = F.relu(self.w1(x) + self.h1(self.last_z))
        self.last_z = z
        y = F.relu(self.o(z))
        return y

rnn = RNN()
rnn.reset_state()
model = L.Classifier(rnn)
optimizer = optimizers.Adam() #Use Adam
optimizer.setup(model)
optimizer.add_hook(chainer.optimizer.GradientClipping(10.0)) #Set the upper limit of the gradient

Assuming a mini-batch with a fixed size of 1, the initial value (last_z) is zero. The value of the hidden layer once calculated is kept in self.last_z, and the result is given to the output layer.

Learning

Generates moderately random series data and trains it.

ans_zundoko = [0, 0, 0, 0, 1] #Correct answer sequence
src_x_ary = [0, 0, 0, 1] #Make the appearance rate of the random number generation array 0 higher than 1

def zd_gen(): #generator
    x_ary = [0, 0, 0, 0, 1]
    y_ary = [0, 0, 0, 0, 1]
    while True:
        x = x_ary.pop(0)
        y = y_ary.pop(0)
        x = [0, 1] if x == 0 else [1, 0]
        yield x, y
        new_x = src_x_ary[np.random.randint(0, 4)] #0 if 0 to 2,If it is 3, it will be 1.
        x_ary.append(new_x)
        y_ary.append(1 if x_ary == ans_zundoko else 0) # x_ary is[0, 0, 0, 0, 1]Only when

bprop_len = 40 #BPPT censoring time
iter = 300 * 100 * 2 #Number of learning
loss = 0
i = 0
for xx, yy in zd_gen():
    x = chainer.Variable(np.asarray([xx], dtype=np.float32))
    t = chainer.Variable(np.asarray([yy], dtype=np.int32))
    loss += model(x, t)
    i += 1
    if i % bprop_len == 0:
        model.zerograds()
        loss.backward()
        loss.unchain_backward()
        optimizer.update()
        print("iter %d, loss %f, x %d, y %d" % (i, loss.data, xx[0], yy))
        loss = 0
    if i > iter:
        break

Learning is time consuming and may or may not work depending on the initial values. It will not work properly unless the loss finally falls below 0.1.

Try changing the optimizer to SGD or changing bprop_len and the results will change. The value set here uses the case that somehow went well at hand.

Evaluation

Evaluate the trained model. You can generate the input column randomly, but for the sake of simplicity, I prepared static evaluation data.

#Dung Dung Dung Dung Doko Doko Dung
x_data = [[0,1], [0,1], [0,1], [0,1], [1,0], [1,0], [1,0], [0,1]]

rnn.reset_state()
for xx in x_data:
    print('Dung' if xx[1] == 1 else 'Doco')
    x = chainer.Variable(np.asarray([xx], dtype=np.float32))
    y = model.predictor(x)
    z = F.softmax(y, use_cudnn=False)
    if z.data[0].argmax() == 1: #Kiyoshi is established when the array subscript with the larger value is 1.
        print('Kiyoshi')

Reference output

The output when it goes well is shown for reference.

iter 59520, loss 0.037670, x 1, y 0
iter 59560, loss 0.051628, x 0, y 0
iter 59600, loss 0.037519, x 0, y 0
iter 59640, loss 0.041894, x 0, y 0
iter 59680, loss 0.059143, x 0, y 0
iter 59720, loss 0.062305, x 0, y 0
iter 59760, loss 0.055293, x 0, y 0
iter 59800, loss 0.060964, x 1, y 1
iter 59840, loss 0.057446, x 1, y 0
iter 59880, loss 0.034730, x 1, y 0
iter 59920, loss 0.054435, x 0, y 0
iter 59960, loss 0.039648, x 0, y 0
iter 60000, loss 0.036578, x 0, y 0
Dung
Dung
Dung
Dung
Doco
Kiyoshi
Doco
Doco
Dung

Impressions

I feel that I can finally understand the RNN that I haven't fully digested. At first, the number of units in the middle layer was small and the BPTT cutoff time was set too short, so it didn't work as expected, but after making various adjustments, it finally started to work. There are many examples of using LSTM and word embedding expressions in the world, but I decided to try it with a more minimized problem, and I am happy that I finally realized it.

However, it is a problem that there is still a part that depends on luck.

Recommended Posts

Learn Zundokokiyoshi using a simple RNN
Creating a simple table using prettytable
A story about simple machine learning using TensorFlow
Sample to draw a simple clock using ebiten
Learn Zundokokiyoshi with LSTM
Create a simple CRUD app using Django's generic class view
I tried to make a simple text editor using PyQt
Learn librosa with a tutorial 1
A simple sample of pivot_table.
Time measurement using a clock
Create a (simple) REST server
# 1 [python3] Simple calculation using variables
Pepper Tutorial (5): Using a Tablet
Using a printer with Debian 10
Create a simple textlint server
A memorandum of using eigen3
Make a simple CO2 incubator using Raspberry PI and CO2 sensor (MH-Z14A)
Create a simple scheduled batch using Docker's Python Image and parse-crontab
I made RNN learn a sine wave and made a prediction: Hyperparameter adjustment
Evaluate the performance of a simple regression model using LeaveOneOut cross-validation