I would like to start studying the Recurrent Neural Network (RNN), which I couldn't afford. Basically, it is like making predictions using recursive processing using LSTM, but there are some things that are still close to sutra copying, so I do not fully understand it, so please point out any mistakes. I want you to have it.
For the implementation example of RNN + LSTM in Chainer, I referred to [Sample] of Chainer (https://github.com/yusuketomoto/chainer-char-rnn). However, as it is, it does not fit into the framework I have made so far, so I did some processing. The explanation of the above code was detailed in here. I also have a rough understanding of RNNs and LSTMs in my recent book.
It's not interesting to move the sample as it is, so let's set the problem.
As it has been said in practice, the period of random numbers created by the Rand () function of Excel is $ 2 ^ {24} $, which is known to be shorter than other random numbers. Therefore, if you use Excel random numbers in Monte Carlo, the results will be biased. (Well, it may be good if the number of samples is small)
On the other hand, the random number of R is a random number with a very long period ($ 2 ^ {19937} $) created by an algorithm called [Mersenne Twister](https://ja.wikipedia.org/wiki/Mersenne Twister). I am.
In other words, if there is periodicity, it may be possible to predict the next random number value by performing sequential learning using RNN. That is the awareness of the problem.
I'm not an expert on random numbers, so I'm not sure if the evaluation method is correct.
First, use the Rand function to generate random numbers in Excel.
=ROUNDDOWN(RAND()*10,0)
Prepare 1001 pieces. Then you will get 10 integer values from 0-9.
Next, the random number of R generates a uniform random number as follows.
x <- floor(runif(1001)*10);
(The value of 1001 is one extra to make the answer data the next value.)
The explanatory variable is the current random value (0-9), and the teacher data is the next random value (0-9).
RNN was implemented using Chainer as follows. Since it is a sequential process, I intentionally set the batch size to 1 and process one by one in order from the beginning.
First, the base class is as follows.
DL_chainer.py
# -*- coding: utf-8 -*-
from chainer import FunctionSet, Variable, optimizers,serializers
from chainer import functions as F
from chainer import links as L
from sklearn import base
from sklearn.cross_validation import train_test_split
from abc import ABCMeta, abstractmethod
import numpy as np
import six
import math
import cPickle as pickle
class BaseChainerEstimator(base.BaseEstimator):
__metaclass__= ABCMeta # python 2.x
def __init__(self, optimizer=optimizers.MomentumSGD(lr=0.01), n_iter=10000, eps=1e-5, report=100,
**params):
self.report = report
self.n_iter = n_iter
self.batch_size = params["batch_size"] if params.has_key("batch_size") else 100
self.network = self._setup_network(**params)
self.decay = 1.
self.optimizer = optimizer
self.optimizer.setup(self.network.collect_parameters())
self.eps = eps
np.random.seed(123)
@abstractmethod
def _setup_network(self, **params):
return FunctionSet(l1=F.Linear(1, 1))
@abstractmethod
def forward(self, x,train=True,state=None):
y = self.network.l1(x)
return y
@abstractmethod
def loss_func(self, y, t):
return F.mean_squared_error(y, t)
@abstractmethod
def output_func(self, h):
return F.identity(h)
@abstractmethod
def fit(self, x_data, y_data):
batchsize = self.batch_size
N = len(y_data)
for loop in range(self.n_iter):
perm = np.random.permutation(N)
sum_accuracy = 0
sum_loss = 0
for i in six.moves.range(0, N, batchsize):
x_batch = x_data[perm[i:i + batchsize]]
y_batch = y_data[perm[i:i + batchsize]]
x = Variable(x_batch)
y = Variable(y_batch)
self.optimizer.zero_grads()
yp = self.forward(x)
loss = self.loss_func(yp,y)
loss.backward()
self.optimizer.update()
sum_loss += loss.data * len(y_batch)
sum_accuracy += F.accuracy(yp,y).data * len(y_batch)
if self.report > 0 and (loop + 1) % self.report == 0:
print('loop={:d}, train mean loss={:.6f} , train mean accuracy={:.6f}'.format(loop + 1, sum_loss / N,sum_accuracy / N))
self.optimizer.lr *= self.decay
return self
def predict(self, x_data):
x = Variable(x_data,volatile=True)
y = self.forward(x,train=False)
return self.output_func(y).data
def predict_proba(self, x_data):
x = Variable(x_data,volatile=True)
y = self.forward(x,train=False)
return self.output_func(y).data
def save_model(self,name):
with open(name,"wb") as o:
pickle.dump(self,o)
class ChainerClassifier(BaseChainerEstimator, base.ClassifierMixin):
def predict(self, x_data):
return BaseChainerEstimator.predict(self, x_data).argmax(1)
def predict_proba(self,x_data):
return BaseChainerEstimator.predict_proba(self, x_data)
The following RNN implementation looks like this:
DL_chainer.py
class RNNTS(ChainerClassifier):
"""
Recurrent Neurarl Network with LSTM by 1 step
"""
def _setup_network(self, **params):
self.input_dim = params["input_dim"]
self.hidden_dim = params["hidden_dim"]
self.n_classes = params["n_classes"]
self.optsize = params["optsize"] if params.has_key("optsize") else 30
self.batch_size = 1
self.dropout_ratio = params["dropout_ratio"] if params.has_key("dropout_ratio") else 0.5
network = FunctionSet(
l0 = L.Linear(self.input_dim, self.hidden_dim),
l1_x = L.Linear(self.hidden_dim, 4*self.hidden_dim),
l1_h = L.Linear(self.hidden_dim, 4*self.hidden_dim),
l2_h = L.Linear(self.hidden_dim, 4*self.hidden_dim),
l2_x = L.Linear(self.hidden_dim, 4*self.hidden_dim),
l3 = L.Linear(self.hidden_dim, self.n_classes),
)
return network
def forward(self, x, train=True,state=None):
if state is None:
state = self.make_initial_state(train)
h0 = self.network.l0(x)
h1_in = self.network.l1_x(F.dropout(h0, ratio=self.dropout_ratio, train=train)) + self.network.l1_h(state['h1'])
c1, h1 = F.lstm(state['c1'], h1_in)
h2_in = self.network.l2_x(F.dropout(h1, ratio=self.dropout_ratio, train=train)) + self.network.l2_h(state['h2'])
c2, h2 = F.lstm(state['c2'], h2_in)
y = self.network.l3(F.dropout(h2, ratio=self.dropout_ratio, train=train))
state = {'c1': c1, 'h1': h1, 'c2': c2, 'h2': h2}
return y,state
def make_initial_state(self,train=True):
return {name: Variable(np.zeros((self.batch_size, self.hidden_dim), dtype=np.float32),
volatile=not train)
for name in ('c1', 'h1', 'c2', 'h2')}
def fit(self, x_data, y_data):
batchsize = self.batch_size
N = len(y_data)
for loop in range(self.n_iter):
sum_accuracy = Variable(np.zeros((), dtype=np.float32))
sum_loss = Variable(np.zeros((), dtype=np.float32))
state = self.make_initial_state(train=True) #Generation of initial state
for i in six.moves.range(0, N, batchsize):
x_batch = x_data[i:i + batchsize]
y_batch = y_data[i:i + batchsize]
x = Variable(x_batch,volatile=False)
y = Variable(y_batch,volatile=False)
yp,state = self.forward(x,train=True,state=state)
loss = self.loss_func(yp,y)
accuracy = F.accuracy(yp,y)
sum_loss += loss
sum_accuracy += accuracy
if (i + 1) % self.optsize == 0:
self.optimizer.zero_grads()
sum_loss.backward()
sum_loss.unchain_backward()
self.optimizer.clip_grads(5)
self.optimizer.update()
if self.report > 0 and (loop + 1) % self.report == 0:
print('loop={:d}, train mean loss={:.6f} , train mean accuracy={:.6f}'.format(loop + 1, sum_loss.data / N,sum_accuracy.data / N))
self.optimizer.lr *= self.decay
return self
def output_func(self, h):
return F.softmax(h)
def loss_func(self, y, t):
return F.softmax_cross_entropy(y, t)
def predict_proba(self, x_data):
N = len(x_data)
state = self.make_initial_state(train=False)
y_list = []
for i in six.moves.range(0, N, self.batch_size):
x = Variable(x_data[i:i+self.batch_size],volatile=True)
y,state = self.forward(x,train=False,state=state)
y_list.append(y.data[0]) #batch size =Only supports 1
y = Variable(np.array(y_list),volatile=False)
return self.output_func(y).data
def predict(self, x_data):
return self.predict_proba(x_data).argmax(1)
The code that is almost referenced is adopted.
If you write so far, you can simply write the main process,
main.py
# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np
from DL_chainer import *
import warnings
from sklearn.metrics import classification_report
warnings.filterwarnings("ignore")
I_FILE_EXCEL="excelrand.csv"
I_FILE_R="RRandom.csv"
def main(file=""):
"""
Learn Excel random numbers with RNN and predict
:return:
"""
df0 = pd.read_csv(file)
N = len(df0)
x_all = df0.iloc[:,0]
y_all = []
y_all.extend(x_all)
y_all.extend([np.nan])
y_all = y_all[1:(N+1)]
x_all_array = np.reshape(np.array(x_all[0:(N-1)],dtype=np.float32),(len(x_all)-1,1))/10
y_all_array = np.reshape(np.array(y_all[0:(N-1)],dtype=np.int32),(len(x_all)-1))
train_n = 2 * N/3
x_train = x_all_array[0:train_n]
y_train = y_all_array[0:train_n]
x_test = x_all_array[train_n:]
y_test = y_all_array[train_n:]
params = {"input_dim":1,"hidden_dim":100,"n_classes":10,"dropout_ratio":0.5,"optsize":30}
print params
print len(x_train),len(x_test)
rnn = RNNTS(n_iter=200,report=1,**params)
rnn.fit(x_train,y_train)
pred = rnn.predict(x_train)
print classification_report(y_train,pred)
pred = rnn.predict(x_test)
print classification_report(y_test,pred)
if __name__ == '__main__':
main(I_FILE_R)
main(I_FILE_EXCEL)
Basically, it is a 10-class classification problem.
Evaluating the results is difficult. It is considered that correct evaluation cannot be made with test data with a small number of samples.
RNN learning basically proceeds regardless of which random number sequence is used. Therefore, I decided to see how the learning speed differs with the same number of epochs.
In other words, in the same epoch, the one with higher accuracy interprets that there is some pattern in the data and it has been "learned".
Since it is better that there is no pattern as a random number, it is naturally better that this learning speed is slow.
The result is below
You can see that the learning speed is faster on the Excel side than on R. In other words, I wonder if R is better in terms of random number quality. Actually, it's better if learning doesn't progress.
This time, for the practice of RNN, I tried the implementation and learned the random numbers of Excel and R. Regardless of whether the evaluation method of the result is correct or not, it was found that learning on the Excel side progressed faster with this evaluation method, and R was better as a random number.
Probably, innumerable random number sequences are needed to verify more properly, but due to computer power, I kept it at this level.
But how can I speed up the calculation with RNN?
Recommended Posts