[PYTHON] I tried deep learning using Theano

Introduction

Since I recently started using Theano as the third library after chainer and caffe, I will explain the implementation of convolutional neural network (CNN) using Theano. Other articles and Official tutorial are easy to understand for basic usage of Theano. So please refer to that. This article will be explained according to Deep Convolutional Network of deep learning tutorial. The code has been changed for explanation, but the general flow is the same. There is no explanation of Deep Learning itself or basic usage of theano in the explanation of CNN implementation using Theano, so please refer to other articles for that area. Also, even if it says the explanation of implementation, there are many points that I can not reach because the implementation ability of the person himself is not so high and he is not good at English (official tutorial is in English) for a few days after starting to use Theano. (Don't expect that much because it feels like a memorandum of the person)

LeNet We will perform minst handwriting recognition based on LeNet according to the deep learning tutorial. LeNet is a basic CNN consisting of 2 convolution layers, 2 pooling layers, and a fully connected layer. In this article, the details such as the activation function are different from the original ones, but the basic structure is the same. (Please read the original paper for a detailed explanation of LeNet)

Implementation

We will implement CNN based on the above LeNet. As a premise, it is assumed that the following is imported.

import theano
import theano.tensor as T
import numpy as np

Convolution layer

First, we will implement the convolution layer. Theano provides T.nnet.conv.conv2d as a symbol for convolution. Theano uses numpy and theano.shared because you have to write the weights and biases yourself. Weights and biases are values that are updated by learning, so we define them using theano.shared. Also, since it is troublesome to describe the layer every time you add a convolution layer, define it as a class. The following is an implementation example of the convolution layer class.

class Conv2d(object):
    def __init__(self, input, out_c, in_c, k_size)
        self._input = input #Symbol to be entered
        self._out_c = out_c #Number of output channels
        self._in_c = in_c #Number of input channels
        w_shp = (out_c, in_c, k_size, k_size) #Weight shape
        w_bound = np.sqrt(6. / (in_c * k_size * k_size + \
                        out_c * k_size * k_size)) #Weight constraints
        #Definition of weights
        self.W = theano.shared( np.asarray(
                        np.random.uniform( #Initialize with random numbers
                            low=-w_bound,
                            high=w_bound,
                            size=w_shp),
                        dtype=self._intype.dtype), name ='W', borrow=True)
        b_shp = out_c, #Bias shape
        #Definition of bias(Initialize with zero)
        self.b = theano.shared(np.zeros(b_shp,
                        dtype=self._input.dtype), name ='b', borrow=True)
        #Definition of convolution symbols
        self.output = T.nnet.conv.conv2d(self._input, self.W) \
                        + self.b.dimshuffle('x', 0, 'x', 'x')
        #Save updated parameters
        self.params = [self.W, self.b]

dimshuffle adjusts the dimension of the bias term from vector to tensor4 which is the output of T.nnet.conv.conv2d. It feels like a combination of reshape and np.transpose. In the case of ('x', 0,'x','x'), the shape of self.b becomes (1, self.b.shape [0], 1, 1).

Activation function

Originally, the activation function of LeNet is tanh, but this time we will use relu. relu is an activation function expressed by a simple formula called max (0, x). Theano doesn't have a relu symbol, so you'll have to define it yourself. The T.max () symbol of Theano is written in a slightly special way because the real value cannot be put inside (although there may be a way) and the if statement cannot be used for the symbol. The following is an implementation example of relu.

class relu(object):
    def __init__(self, input):
        self._input = input
        self.output  = T.switch(self._input < 0, 0, self._input)

Pooling layer

The pooling layer is placed in Theano and the symbol is defined in theano.tensor.signal.pool.pool_2d. The pooling layer is easy to write because unlike convolution, there is no need to prepare symbols for updating such as weights and biases. The following is an implementation example of the pooling layer.

from theano.tensor.signal import pool

class Pool2d(object):
    def __init__(self, input, k_size, st, pad=0, mode='max'):
        self._input = input
        #Definition of pooling layer symbols
        self.output = pool.pool_2d(self._input, 
                            (k_size, k_size), #Kernel size
                            ignore_border=True, #Edge processing(Basically True and ok,For details, go to the official document)
                            st=(st, st), #stride
                            padding=(pad, pad), #Padding
                            mode=mode) #Types of pooling('max', 'sum', 'average_inc_pad', 'average_exc_pad')

Fully connected layer

The fully connected layer is described by myself because the symbol is not prepared in theano, but it can be expressed by the inner product calculation of the matrix, and the symbol of the inner product is given by T.dot (), so it is not particularly difficult. As with the convolution layer, there are weights and biases, so each is defined. The following is an implementation example of the fully connected layer.

class FullyConnect(object):
    def __init__(self, input, inunit, outunit):
        self._input = input
        #Definition of weights
        W = np.asarray(
            np.random.uniform(
            low=-np.sqrt(6. / (inunit + outunit)),
            high=np.sqrt(6. / (inunit + outunit)),
            size=(inunit, outunit)
            ),
            dtype=theano.config.floatX)
        self.W = theano.shared(value=W, name='W', borrow=True)
        #Definition of bias
        b = np.zeros((outunit,), dtype=theano.config.floatX) #Initialize with zero
        self.b = theano.shared(value=b, name='b', borrow=True)
        #Definition of fully connected layer symbols
        self.output = T.dot(self._input, self.W) + self.b
        #Save updated parameters
        self.params = [self.W, self.b]

Loss function

The loss function uses softmax cross entropy to solve the mnist 10-class classification. The softmax symbol is provided by T.nnet.softmax () in Theano, so use this. The following is an implementation example.

class softmax(object):
    def __init__(self, input, y):
        self._input = input
        #symbol definition of softmax
        self.output = nnet.softmax(self._input)
        #Symbol definition of cross entropy(Mathematically it is sum, but here we use mean)
        self.cost = -T.mean(T.log(self.output)[T.arange(y.shape[0]), y])

y represents the teacher label symbol. [T.arange (y.shape [0]), y] means to add from y [0] to y [y.shape [0] -1] when doing T.mean.

LeNet

data set

Now that we have defined each layer, let's move on to the LeNet implementation. First, prepare the mnist data. The pkl data is available at http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz, so download it to the appropriate folder. Extract train, validation and test data from here. Below is an example of loading data.

import gzip
import cPickle

def shared_dataset(data_xy):
    data_x, data_y = data_xy
    set_x = theano.shared(np.asarray(data_x,
                  dtype=theano.config.floatX).reshape(-1,1,28,28),
                  borrow=True)
    set_y = T.cast(theano.shared(np.asarray(data_y,
                  dtype=theano.config.floatX), borrow=True), 'int32')
    return set_x, set_y

with open('/path/to/mnist.pkl.gz', 'rb') as f:
    train_set, valid_set, test_set = cPickle.load(f)

train_set_x, train_set_y = shared_dataset(train_set)
valid_set_x, valid_set_y = shared_dataset(valid_set)
test_set_x, test_set_y = shared_dataset(test_set)

Here, for convenience of implementation, the data is defined as a symbol of theano.shared, but the input can be an array of numpy. However, defining it in theano.shared makes the implementation somewhat cleaner. Next, define the input data and the symbol of the teacher label. Since the input data is 4D (number of batches, number of channels, vertical, horizontal), it is T.tensor4 (), and the teacher label is a 1D vector of integer values, so it is T.ivector ().

x = T.tensor4() #Input data symbol
y = T.ivector() #Output data symbol

Layer definition

From here, it becomes the definition of each layer. I've created each class above so it's a little easier to write.

conv1 = Conv2d(x, 20, 1, 5) #With x as input, 20 channels for output, 1 channel for input, kernel size 5
relu1 = relu(conv1.output) #Input the output of conv1
pool1 = Pool2d(relu1.output, 2, 2) #Take the output of relu1 as input,Kernel size 2, stride 2

conv2 = Conv2d(pool1.output, 50, 20, 5) #With the output of poo1 as the input, the output is 50 channels, the input is 20 channels, and the kernel size is 5.
relu2 = relu(conv2.output) #Input the output of conv2
pool2 = Pool2d(relu2.output, 2, 2) #With the output of relu2 as input,Kernel size 2, stride 2

fc1_input = pool2.output.flatten(2) #The symbol of the output of pool2 is T.flatten for tensor4()Use to match the input symbol of the fully connected layer
fc1 = FullyConnect(fc1_input, 50*4*4, 500) #50 input units*4*4(Number of channels*Vertical*side), The number of output units is 500
relu3 = relu(fc1.output)
fc2 = FullyConnect(relu3.output, 500, 10) #500 input units, 10 output units(For 10 classification)
loss = softmax(fc2.output, y)

Now you have the definition of LeNet. By changing the output symbol of each layer to the input symbol of the next layer, all the symbols are connected and the gradient calculation can be performed at once. That is, ・ ・ ・ T.nnet.conv.conv2d (pool.pool2d (T.nnet.conv.conv2d ())) ・ ・ ・ This means that a long symbol such as is defined. Therefore, by giving T.grad () the final symbol (loss.cost this time), it is possible to easily calculate the gradient of all layers.

Learning

Finally, we define the function of learning and evaluation by validation data and test data. Up to this point, we have only defined symbols, so we cannot enter actual values for learning. So define the symbol as theano.function. At that time, learning can be performed by defining to update the parameters. This time, we will update the parameters using SGD. Below is an implementation example of theano.function for learning.

#List all parameters to be learned
params = conv1.params + conv2.params + fc1.params + fc2.params 
#Calculate the derivative for each parameter
grads = T.grad(loss.cost, params)
#Definition of learning rate
learning_rate = 0.001
#Define update expression
updates = [(param_i, param_i - learning_rate * grad_i) for param_i, grad_i in zip(params, grads)]
#Learning theano.define function
index = T.lscalar()
batch_size = 128
train_model = theano.function(inputs=[index], #Input is the index of learning data
                       outputs=loss.cost, #Output is loss.cost
                       updates=updates, #Renewal formula
                       givens={
                            x: train_set_x[index: index + batch_size], #Train to x in input_set_Give x
                            y: train_set_y[index: index + batch_size] #Train to input y_set_Give y
                       })

First, params is a list of weight and bias symbols for each layer (because it is the addition of the lists). Since T.grads () returns a list of symbols differentiated by each variable when variables are given in a list, grads is a list with symbols differentiated by each parameter of loss.cost. updates is also a list of update expressions for each parameter. Next is the definition of train_model. As mentioned above, conv1 to loss.cost are one symbol that inputs x and y that were defined first. In train_model, x receives the value of train_set_x and y receives the value of train_set_y. train_set_x and train_set_y receive the index and refer to the data for batch_size from the received index. Therefore, by giving only index as an argument to train_model, the values of train_set_x and train_set_y from index to index + batch_size are given to x and y. After that, you can learn by repeatedly calling this train_model with a for statement or the like.

for i in range(0, train_set_y.get_value().shape[0], batch_size):
    train_model(i)

Evaluation

Finally, define theano.function to evaluate the accuracy of the training model. Since the parameter is not updated in the evaluation, the output is set to error rate. Since there is a softmax symbol in loss.output, use this to define the symbol that calculates the error rate, and define theano.function that evaluates using the error rate symbol.

pred = T.argmax(loss.output, axis=1) #Returns the class with the highest predicted probability
error = T.mean(T.neq(pred,y)) #Compare the predicted class with the correct label
test_model = theano.function(inputs=[index],
                             outputs=error,
                             givens={
                             x: test_set_x[index: index + batch_size],
                             y: test_set_y[index: index + batch_size]
                             })

val_model = theano.function(inputs=[index],
                             outputs=error,
                             givens={
                             x: test_set_x[index: index + batch_size],
                             y: test_set_y[index: index + batch_size]
                             })

Now that the evaluation function for validation and test data has been defined, it can be evaluated by using the for statement in the same way as train_model.

test_losses = [test_model(i)
               for i in range(0, test_set_y.get_value().shape[0], batch_size] #Save average loss per batch in list
mean_test_loss = np.mean(test_losses) #Calculate the overall average

With the above, CNN learning and evaluation code using theano have been written. Please check the actual learning result by connecting all the above code (I changed the code I wrote for Quiita in some places and wrote it without debugging, so it may not work lol). If you have any questions, code, or explanations, please let us know in the comments.

Summary

I explained the implementation of a convolutional neural network using Theano. At first glance, the code may seem long and tedious, but once you've defined the layer classes for your own convenience, it's easy to write. The advantage of Theano is that you can define it in various ways (although it is troublesome). Implementations other than LeNet can create various models by changing the layer definition part. In addition, loss functions and parameter update methods can be freely described by defining symbols. I hope it will be useful to those who are going to use Theano from now on.

Recommended Posts

I tried deep learning using Theano
I tried deep learning
I tried reinforcement learning using PyBrain
I tried hosting a TensorFlow deep learning model using TensorFlow Serving
[Kaggle] I tried ensemble learning using LightGBM
I tried using parameterized
I tried using argparse
I tried using mimesis
I tried using aiomysql
I tried using Summpy
I tried using Pipenv
I tried using matplotlib
I tried using ESPCN
I tried using openpyxl
I tried using Ipython
I tried using PyCaret
I tried using cron
I tried using ngrok
I tried using face_recognition
I tried using Jupyter
I tried using PyCaret
I tried using Heapq
I tried using doctest
I tried using folium
I tried using jinja2
I tried using folium
I tried using time-window
An amateur tried Deep Learning using Caffe (Introduction)
An amateur tried Deep Learning using Caffe (Practice)
I tried using the trained model VGG16 of the deep learning library Keras
I tried hosting Pytorch's deep learning model using TorchServe on Amazon SageMaker
I tried the common story of using Deep Learning to predict the Nikkei 225
I tried the common story of predicting the Nikkei 225 using deep learning (backtest)
[I tried using Pythonista 3] Introduction
I tried using easydict (memo).
I tried face recognition using Face ++
I tried using Random Forest
I tried using BigQuery ML
I tried using Amazon Glacier
I tried using git inspector
I tried using magenta / TensorFlow
I tried using AWS Chalice
I tried using Slack emojinator
I tried learning my own dataset using Chainer Trainer
[Deep Learning from scratch] I tried to explain Dropout
PyTorch Learning Note 2 (I tried using a pre-trained model)
I tried to compress the image using machine learning
[Python] Deep Learning: I tried to implement deep learning (DBN, SDA) without using a library.
I tried running an object detection tutorial using the latest deep learning algorithm
I tried to implement Deep VQE
I tried using Rotrics Dex Arm
I tried using GrabCut of OpenCV
I tried using Tensorboard, a visualization tool for machine learning
I tried using Thonny (Python / IDE)
[TF] I tried to visualize the learning result using Tensorboard
I tried server-client communication using tmux
Deep Learning
Somehow I tried using jupyter notebook
I tried to implement Perceptron Part 1 [Deep Learning from scratch]
I tried learning LightGBM with Yellowbrick
[Kaggle] I tried undersampling using imbalanced-learn