** 2016/7/18: Corrected because there was an error in the calculation formula of the size after filtering. ** **
In recent years, AI, deep learning, etc. have been making noise in various places. I think that the number of libraries has increased and the atmosphere has become easier to try. Therefore, I will try to analyze the sentiment of Tweet using Chainer. Let's do something similar with Chainer while referring to Article on Theano.
However, if you are accustomed to machine learning libraries such as sklearn, it is a little difficult to use, so if you can solve that as well.
I have a lot of my own understanding, so there may be mistakes, but I would appreciate it if you could point out that.
Mac OSX Yosemite 10.10.15 Python 2.7 CPU Intel Core i5 2.6GHz Memory 8GB
(Is it okay with such equipment? → I don't know)
pip install chainer
In sklearn
model = (SVM or Random Forest)
model.fit(x_train,y_train)
y_p = model.predict(x_test)
It was easy to do.
Where x_train is a matrix of magnitude $ N x M $ and y_train is a teacher vector of length $ N $ (such as 0,1). $ N $ is the sample size and $ M $ is the number of features. x_test is the test data with the same number of columns (that is, the same size of features) as x_train.
On the other hand, Chainer does not have methods like "fit" and "predict" like this, you have to make it yourself.
For example, in Multilayer Perceptron (MLP), it seems to implement as follows.
Base class as follows
# -*- coding: utf-8 -*-
from chainer import FunctionSet, Variable, optimizers
from chainer import functions as F
from sklearn import base
from abc import ABCMeta, abstractmethod
import numpy as np
import six
class BaseChainerEstimator(base.BaseEstimator):
__metaclass__= ABCMeta # python 2.x
def __init__(self, optimizer=optimizers.SGD(), n_iter=10000, eps=1e-5, report=100,
**params):
self.network = self._setup_network(**params)
self.optimizer = optimizer
self.optimizer.setup(self.network.collect_parameters())
self.n_iter = n_iter
self.eps = eps
self.report = report
@abstractmethod
def _setup_network(self, **params):
return FunctionSet(l1=F.Linear(1, 1))
@abstractmethod
def forward(self, x, train=True):
y = self.network.l1(x)
return y
@abstractmethod
def loss_func(self, y, t):
return F.mean_squared_error(y, t)
@abstractmethod
def output_func(self, h):
return F.identity(h)
def fit(self, x_data, y_data):
batchsize = 100
N = len(y_data)
for loop in range(self.n_iter):
perm = np.random.permutation(N)
sum_accuracy = 0
sum_loss = 0
for i in six.moves.range(0, N, batchsize):
x_batch = x_data[perm[i:i + batchsize]]
y_batch = y_data[perm[i:i + batchsize]]
x = Variable(x_batch)
y = Variable(y_batch)
self.optimizer.zero_grads()
yp = self.forward(x)
loss = self.loss_func(yp,y)
loss.backward()
self.optimizer.update()
sum_loss += loss.data * len(y_batch)
sum_accuracy += F.accuracy(yp,y).data * len(y_batch)
if self.report > 0 and loop % self.report == 0:
print('loop={}, train mean loss={} , train mean accuracy={}'.format(loop, sum_loss / N,sum_accuracy / N))
return self
def predict(self, x_data):
x = Variable(x_data)
y = self.forward(x,train=False)
return self.output_func(y).data
class ChainerClassifier(BaseChainerEstimator, base.ClassifierMixin):
def predict(self, x_data):
return BaseChainerEstimator.predict(self, x_data).argmax(1) #argmax returns the largest index in the rows of the matrix. So the class is 0 to 1,Must be 2
def predict_proba(self,x_data):
return BaseChainerEstimator.predict(self, x_data)
On top of that, the MLP class inherits the ChainerClassifier,
class MLP3L(ChainerClassifier):
"""
3-Layer Perceptron
"""
def _setup_network(self, **params):
network = FunctionSet(
l1=F.Linear(params["input_dim"], params["hidden_dim"]),
l2=F.Linear(params["hidden_dim"], params["hidden_dim"]),
l3=F.Linear(params["hidden_dim"], params["n_classes"]),
)
return network
def forward(self, x, train=True):
h1 = F.dropout(F.relu(self.network.l1(x)),train=train)
h2 = F.dropout(F.relu(self.network.l2(h1)),train=train)
y = self.network.l3(h2)
return y
def loss_func(self, y, t):
return F.softmax_cross_entropy(y, t)
def output_func(self, h):
return F.softmax(h)
To implement.
Now you can use "fit" and "predict (predict_proba)" like sklearn.
It seems that x_data must be numpy.float32 type and y_data must be numpy.int32 type. (Casted to Chainer's Variable inside fit)
Now, in the case of the above MLP, the above x_data can be a matrix of size $ N × M $, just like sklearn. However, if you try to extend this to, for example, a convolutional neural network (CNN), problems suddenly arise.
Since CNN is mainly used in image processing, the input is two-dimensional, and if you add the batch size (sample size) to it, you have to make it three-dimensional x_data. (There is a concept of channel ?, and it is actually a 4D tensor)
I used the code of here as a sample.
The MNIST image I'm using is $ 28 x 28 $.
model = chainer.FunctionSet(conv1=F.Convolution2D(1, 20, 5),
conv2=F.Convolution2D(20, 50, 5),
l1=F.Linear(800, 500),
l2=F.Linear(500, 10))
def forward(x_data, y_data, train=True):
x, t = chainer.Variable(x_data), chainer.Variable(y_data)
h = F.max_pooling_2d(F.relu(model.conv1(x)), 2)
h = F.max_pooling_2d(F.relu(model.conv2(h)), 2)
h = F.dropout(F.relu(model.l1(h)), train=train)
y = model.l2(h)
if train:
return F.softmax_cross_entropy(y, t)
else:
return F.accuracy(y, t)
Looking at the reference of F.Convolution2D,
It is designed to put in_channels in the first argument, out_channels in the second argument, and ksize (Filter size) in the third argument. It seems that in_channels is set to 3 with RGB, but I'm trying with 1, and out_channels is the number of output channels, but maybe 20 kinds of images are created with different filters? I understand it without permission. Since ksize is 5, it means that the filter is $ 5 x 5 $.
(Corrected on July 18, 2016 from here)
~~ In the convolution process, if the filter size is $ F $ and the image size is $ S × S $, the image size after filtering will be $ S_f × S_f $ if no padding is included. According to the article](http://aidiary.hatenablog.com/entry/20151108/1446952402) ~~
S_f = S - 2 × [F/2]
It becomes ~~. $ [] $ Is truncated after the decimal point. ~~
** Apparently, it seems different when I try it, or rather it was written in Chainer's Document. ** **
S_f = S - F + 1
It's okay. It's the same as the moving average, isn't it? The previous formula works well for odd filter sizes, but not for even numbers.
Also, in the pooling process, the edge processing differs depending on whether Max pooling is used or Average pooling is used. As I tried, Average pooling cannot calculate if there is a remainder after dividing the target size by the pooling size, but Max pooling does. Therefore, you have to be careful about that area.
(2016/7/18 correction so far)
In other words, in this example,
In the first convolution
S_{f1} = 28 - 2 × [5/2] = 24
So, since Max pooling is performed in the forward function, the size after pooling is set to $ S_ {p1} x S_ {p1} $.
S_{p1} = 24 / 2 = 12
So, in the second convolution
S_{f2} = 12 - 2 × [5/2] = 8
So, in the forward function, Max pooling is performed, so the size after pooling is $ S_ {p2} × S_ {p2} $.
S_{p2} = 8 / 2 = 4
It will be.
In other words, the dimension of the feature amount that becomes the final input is that the number of outputs is 50, so
M = 50 × 4 × 4 = 800
And the first layer
l1=F.Linear(800, 500)
Matches the first argument of. (Chainer seems to tell you the correct answer if you make a mistake)
Well, after defining the model, we throw x_data to the forward function, but there is still a problem, and when doing Convolution, we have to throw a 4D tensor from the following reference. (See x in Parameters)
$ n $ is the batch size (sample size), $ c_I $ is the number of channels, and $ h $ and $ w $ are the vertical and horizontal sizes of the image.
Looking at the above sample code, it is converted to a 4D tensor using reshape as shown below.
X_train = X_train.reshape((len(X_train), 1, 28, 28))
This time, I wanted to reshape from the Variable type state, and when I looked it up, the same thing was defined as a Chainer function.
Use this.
Recommended Posts