I would like to start studying the Recurrent Neural Network (RNN), which I couldn't afford. Basically, it is like making predictions using recursive processing using LSTM, but there are some things that are still close to sutra copying, so I do not fully understand it, so please point out any mistakes. I want you to have it.

For the implementation example of RNN + LSTM in Chainer, I referred to [Sample] of Chainer (https://github.com/yusuketomoto/chainer-char-rnn). However, as it is, it does not fit into the framework I have made so far, so I did some processing. The explanation of the above code was detailed in here. I also have a rough understanding of RNNs and LSTMs in my recent book.

Problem setting

It's not interesting to move the sample as it is, so let's set the problem.

As it has been said in practice, the period of random numbers created by the Rand () function of Excel is $ 2 ^ {24} $, which is known to be shorter than other random numbers. Therefore, if you use Excel random numbers in Monte Carlo, the results will be biased. (Well, it may be good if the number of samples is small)

On the other hand, the random number of R is a random number with a very long period ($ 2 ^ {19937} $) created by an algorithm called [Mersenne Twister](https://ja.wikipedia.org/wiki/Mersenne Twister). I am.

In other words, if there is periodicity, it may be possible to predict the next random number value by performing sequential learning using RNN. That is the awareness of the problem.

I'm not an expert on random numbers, so I'm not sure if the evaluation method is correct.

Data preparation

First, use the Rand function to generate random numbers in Excel.


Prepare 1001 pieces. Then you will get 10 integer values from 0-9.

Next, the random number of R generates a uniform random number as follows.

x <- floor(runif(1001)*10);

(The value of 1001 is one extra to make the answer data the next value.)

The explanatory variable is the current random value (0-9), and the teacher data is the next random value (0-9).

RNN implementation

RNN was implemented using Chainer as follows. Since it is a sequential process, I intentionally set the batch size to 1 and process one by one in order from the beginning.

First, the base class is as follows.


# -*- coding: utf-8 -*-
from chainer import FunctionSet, Variable, optimizers,serializers
from chainer import functions as F
from chainer import links as L
from sklearn import base
from sklearn.cross_validation import train_test_split
from abc import ABCMeta, abstractmethod
import numpy as np
import six
import math
import cPickle as pickle

class BaseChainerEstimator(base.BaseEstimator):
    __metaclass__= ABCMeta  # python 2.x
    def __init__(self, optimizer=optimizers.MomentumSGD(lr=0.01), n_iter=10000, eps=1e-5, report=100,
        self.report = report
        self.n_iter = n_iter
        self.batch_size = params["batch_size"] if params.has_key("batch_size") else 100
        self.network = self._setup_network(**params)
        self.decay = 1.
        self.optimizer = optimizer
        self.eps = eps

    def _setup_network(self, **params):
        return FunctionSet(l1=F.Linear(1, 1))

    def forward(self, x,train=True,state=None):
        y = self.network.l1(x)
        return y

    def loss_func(self, y, t):
        return F.mean_squared_error(y, t)

    def output_func(self, h):
        return F.identity(h)
    def fit(self, x_data, y_data):
        batchsize = self.batch_size
        N = len(y_data)
        for loop in range(self.n_iter):
            perm = np.random.permutation(N)
            sum_accuracy = 0
            sum_loss = 0
            for i in six.moves.range(0, N, batchsize):
                x_batch = x_data[perm[i:i + batchsize]]
                y_batch = y_data[perm[i:i + batchsize]]
                x = Variable(x_batch)
                y = Variable(y_batch)
                yp = self.forward(x)
                loss = self.loss_func(yp,y)
                sum_loss += loss.data * len(y_batch)
                sum_accuracy += F.accuracy(yp,y).data * len(y_batch)
            if self.report > 0 and (loop + 1) % self.report == 0:
                print('loop={:d}, train mean loss={:.6f} , train mean accuracy={:.6f}'.format(loop + 1, sum_loss / N,sum_accuracy / N))
            self.optimizer.lr *= self.decay

        return self

    def predict(self, x_data):
        x = Variable(x_data,volatile=True)
        y = self.forward(x,train=False)
        return self.output_func(y).data

    def predict_proba(self, x_data):
        x = Variable(x_data,volatile=True)
        y = self.forward(x,train=False)
        return self.output_func(y).data

    def save_model(self,name):
        with open(name,"wb") as o:

class ChainerClassifier(BaseChainerEstimator, base.ClassifierMixin):
    def predict(self, x_data):
        return BaseChainerEstimator.predict(self, x_data).argmax(1) 

    def predict_proba(self,x_data):
        return BaseChainerEstimator.predict_proba(self, x_data)

The following RNN implementation looks like this:


class RNNTS(ChainerClassifier):
    Recurrent Neurarl Network with LSTM by 1 step
    def _setup_network(self, **params):

        self.input_dim = params["input_dim"]
        self.hidden_dim = params["hidden_dim"]
        self.n_classes = params["n_classes"]
        self.optsize = params["optsize"] if params.has_key("optsize") else 30
        self.batch_size = 1 
        self.dropout_ratio = params["dropout_ratio"] if params.has_key("dropout_ratio") else 0.5

        network = FunctionSet(
            l0 = L.Linear(self.input_dim, self.hidden_dim),
            l1_x = L.Linear(self.hidden_dim, 4*self.hidden_dim),
            l1_h = L.Linear(self.hidden_dim, 4*self.hidden_dim),
            l2_h = L.Linear(self.hidden_dim, 4*self.hidden_dim),
            l2_x = L.Linear(self.hidden_dim, 4*self.hidden_dim),
            l3   = L.Linear(self.hidden_dim, self.n_classes),
        return network

    def forward(self, x, train=True,state=None):
        if state is None:
            state = self.make_initial_state(train)
        h0 = self.network.l0(x)
        h1_in = self.network.l1_x(F.dropout(h0, ratio=self.dropout_ratio, train=train)) + self.network.l1_h(state['h1'])
        c1, h1 = F.lstm(state['c1'], h1_in)
        h2_in = self.network.l2_x(F.dropout(h1, ratio=self.dropout_ratio, train=train)) + self.network.l2_h(state['h2'])
        c2, h2 = F.lstm(state['c2'], h2_in)
        y = self.network.l3(F.dropout(h2, ratio=self.dropout_ratio, train=train))
        state = {'c1': c1, 'h1': h1, 'c2': c2, 'h2': h2}

        return y,state

    def make_initial_state(self,train=True):
        return {name: Variable(np.zeros((self.batch_size, self.hidden_dim), dtype=np.float32),
                volatile=not train)
                for name in ('c1', 'h1', 'c2', 'h2')}

    def fit(self, x_data, y_data):
        batchsize = self.batch_size
        N = len(y_data)
        for loop in range(self.n_iter):
            sum_accuracy = Variable(np.zeros((), dtype=np.float32))
            sum_loss = Variable(np.zeros((), dtype=np.float32))
            state = self.make_initial_state(train=True) #Generation of initial state
            for i in six.moves.range(0, N, batchsize):
                x_batch = x_data[i:i + batchsize]
                y_batch = y_data[i:i + batchsize]

                x = Variable(x_batch,volatile=False)
                y = Variable(y_batch,volatile=False)
                yp,state = self.forward(x,train=True,state=state)
                loss = self.loss_func(yp,y)
                accuracy = F.accuracy(yp,y)
                sum_loss += loss
                sum_accuracy += accuracy

                if (i + 1) % self.optsize == 0:

            if self.report > 0 and (loop + 1) % self.report == 0:
                print('loop={:d}, train mean loss={:.6f} , train mean accuracy={:.6f}'.format(loop + 1, sum_loss.data / N,sum_accuracy.data / N))
            self.optimizer.lr *= self.decay

        return self

    def output_func(self, h):
        return F.softmax(h)

    def loss_func(self, y, t):
        return F.softmax_cross_entropy(y, t)

    def predict_proba(self, x_data):
        N = len(x_data)
        state = self.make_initial_state(train=False)
        y_list = []
        for i in six.moves.range(0, N, self.batch_size):
            x = Variable(x_data[i:i+self.batch_size],volatile=True)
            y,state = self.forward(x,train=False,state=state)
            y_list.append(y.data[0]) #batch size =Only supports 1
        y = Variable(np.array(y_list),volatile=False)
        return self.output_func(y).data

    def predict(self, x_data):
        return self.predict_proba(x_data).argmax(1)

The code that is almost referenced is adopted.

If you write so far, you can simply write the main process,


# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np
from DL_chainer import *
import warnings
from sklearn.metrics import classification_report


def main(file=""):
Learn Excel random numbers with RNN and predict
    df0 = pd.read_csv(file)
    N = len(df0)
    x_all = df0.iloc[:,0]
    y_all = []
    y_all = y_all[1:(N+1)]

    x_all_array = np.reshape(np.array(x_all[0:(N-1)],dtype=np.float32),(len(x_all)-1,1))/10
    y_all_array = np.reshape(np.array(y_all[0:(N-1)],dtype=np.int32),(len(x_all)-1))

    train_n = 2 * N/3

    x_train = x_all_array[0:train_n]
    y_train = y_all_array[0:train_n]

    x_test = x_all_array[train_n:]
    y_test = y_all_array[train_n:]

    params = {"input_dim":1,"hidden_dim":100,"n_classes":10,"dropout_ratio":0.5,"optsize":30}
    print params
    print len(x_train),len(x_test)
    rnn = RNNTS(n_iter=200,report=1,**params)
    pred = rnn.predict(x_train)
    print classification_report(y_train,pred)

    pred = rnn.predict(x_test)
    print classification_report(y_test,pred)

if __name__ == '__main__':

Basically, it is a 10-class classification problem.

Evaluation of results

Evaluating the results is difficult. It is considered that correct evaluation cannot be made with test data with a small number of samples.

RNN learning basically proceeds regardless of which random number sequence is used. Therefore, I decided to see how the learning speed differs with the same number of epochs.

In other words, in the same epoch, the one with higher accuracy interprets that there is some pattern in the data and it has been "learned".

Since it is better that there is no pattern as a random number, it is naturally better that this learning speed is slow.

The result is below


You can see that the learning speed is faster on the Excel side than on R. In other words, I wonder if R is better in terms of random number quality. Actually, it's better if learning doesn't progress.


This time, for the practice of RNN, I tried the implementation and learned the random numbers of Excel and R. Regardless of whether the evaluation method of the result is correct or not, it was found that learning on the Excel side progressed faster with this evaluation method, and R was better as a random number.

Probably, innumerable random number sequences are needed to verify more properly, but due to computer power, I kept it at this level.

But how can I speed up the calculation with RNN?

