[PYTHON] Introduction to Deep Learning (1) --Chainer is explained in an easy-to-understand manner for beginners-

Introduction

This is an introductory article about Deep Learning, which is popular these days. Deep Learning already has abundant open source libraries, and this time we will use chainer, which has a reputation for domestic production, and in an article it was stated that GPU calculation is relatively fast at present.

However, there are many introductory articles on Chainer, but most of them end up running a sample of mnist that recognizes handwriting. Certainly, if you look at the mnist sample, you can understand how to use Chainer, but somehow it is different from being able to build it yourself, so this time it is up to the point that you can build Deep Learning using chainer yourself. I will do it with the goal.

Development environment

・ OS: Mac OS X EL Capitan (10.11.5) · Python 2.7.12: Anaconda 4.1.1 (x86_64) ・ Chainer 1.12.0

If you have not prepared the chainer environment,

$ pip install chainer

You can easily install it at.

Forward / Backward Computation

Here, it is about the forward and reverse calculation using variables, which is the stage before stepping into the Neural Network.

First, load chainer and declare variables.

>>> import chainer
>>> x_data = np.array([5], dtype=np.float32)
>>> x_data
array([ 5.], dtype=float32)

Basically, it seems to declare it as a float type of an array of numpy.

Use `` `chainer.Variable``` as a variable for use within chainer.

>>> x = chainer.Variable(x_data)
>>> x
<variable at 0x10b796fd0>

You can check the value of x in `` `.data```.

>>> x.data
array([ 5.], dtype=float32)

Next, declare the function y of x. This time, we will use the following function. y = x^2 - 2x + 1

>>> y = x ** 2 - 2 * x + 1
>>> y
<variable at 0x10b693dd0>

You can check the value of y in the same way.

>>> y.data
array([ 16.], dtype=float32)

By calling the following method, it will be possible to calculate the derivative.

>>> y.backward()

The gradient for back-propagation is `` `grad```.

>>> x.grad
array([ 8.], dtype=float32)

It is a little difficult to understand which gradient it is for, but the value of the gradient when y is differentiated by x.

y'(x) = 2x - 2\\
\rightarrow \ y'(5) = 8

The value `8``` of `` x.grad``` is derived from.

In Chainer's official reference, if x is a multidimensional array, initialize `` `y.gradAfter that, it says to calculate x.grad```. If you do not initialize it, it will be added to the array where the gradient values are stored, so remember that "initialize before gradient calculation".

>>> x = chainer.Variable(np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float32))
>>> y = x**2 - 2*x + 1
>>> y.grad = np.ones((2, 3), dtype=np.float32)
>>> y.backward()
>>> x.grad
array([[  0.,   2.,   4.],
       [  6.,   8.,  10.]], dtype=float32)

Explicit structure of Neural Network model (Links)

When constructing a Neural Network, explicitly declare what kind of structure model to configure (specifically, how many nodes and how many layers). The decision on this model still depends on experience and intuition. Neural Networks including Deep Learning automatically adjust the internal hyperparameters, but the first model needs to be decided in advance. A little off the beaten track, Bayesian statistics are also designed so that the Monte Carlo Markov Chain (MCMC) can successfully estimate posterior distributions based on Bayes' theorem, but even then the prior distributions must be determined arbitrarily. .. Whether it's Neural Network or Bayesian statistics, I hope that if an epoch-making method to solve this area is proposed, a predictive model can be constructed so that it can respond universally to any problem.

Let's get back to the story, but let's make it possible to call chainer by abbreviation in the python code.

>>> import chainer.links as L

As anyone studying Neural Networks knows, there is a parameter called weight between these nodes. For now, let's try the simplest linear combination pattern.

>>> f = L.Linear(3, 2)
>>> f
<chainer.links.connection.linear.Linear object at 0x10b7b4290>

This shows a structure with three layers of inputs and two layers of outputs. linearTo briefly explain the part of, it means that the nodes are connected by the linear combination mentioned earlier, so it is expressed by the following relational expression.

f(x) = Wx + b\\
f \in \mathcal{R}^{2 \times 1},
x \in \mathcal{R}^{3 \times 1},\\
W \in \mathcal{R}^{2 \times 3}, b \in \mathcal{R}^{2 \times 1}

Therefore, although not explicitly declared, the `f``` declared above has the parameters W``` and the weight vector `` b```. I am.

>>> f.W.data
array([[-0.02878495,  0.75096768, -0.10530342],
       [-0.26099312,  0.44820449, -0.06585278]], dtype=float32)
>>> f.b.data
array([ 0.,  0.], dtype=float32)

If you implement it without knowing the internal specifications around here, it will be incomprehensible. By the way, even though I don't remember initializing the weight matrix `` `W```, it has a value at random when the Linear link is declared due to the specifications of chainer. It seems that it is because it is shaken.

So, as you can see in the official chainer documentation, this is the format you use most often.

>>> f = L.Linear(3, 2)
>>> x = chainer.Variable(np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float32))
>>> y = f(x)
>>> y.data
array([[ 1.15724015,  0.43785751],
       [ 3.0078783 ,  0.80193317]], dtype=float32)

You can see that x, which was a three-dimensional vector, is converted to two-dimensional y by a linear combination. At this time, the initial value is automatically assigned to the weight matrix W internally, so this calculation can be done without throwing an error.

For confirmation, when I looked at the weight value, the initial value was certainly assigned.

>>> f.W.data
array([[-0.02878495,  0.75096768, -0.10530342],
       [-0.26099312,  0.44820449, -0.06585278]], dtype=float32)

>>> f.b.data
array([ 0.,  0.], dtype=float32)

Next, we will calculate the gradient learned in the previous chapter. The official documentation for chainer has a lot of emphasis on notes, but the value of each gradient accumulates with each calculation. Therefore, you usually need to initialize the gradient value to 0 with the following method before calculating the value for each gradient.

>>> f.zerograds()

Make sure the gradient values are initialized correctly.

>>> f.W.grad
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]], dtype=float32)

>>> f.b.grad
array([ 0.,  0.], dtype=float32)

Now let's calculate the value for each gradient.

>>> y.grad = np.ones((2, 2), dtype=np.float32)
>>> y.backward()
>>> f.W.grad
array([[ 5.,  7.,  9.],
       [ 5.,  7.,  9.]], dtype=float32)
>>> f.b.grad
array([ 2.,  2.], dtype=float32)

You can calculate it properly.

Write a model as a chain

We will expand the model explicitly defined in the previous chapter in multiple layers.

>>> l1 = L.Linear(4, 3)
>>> l2 = L.Linear(3, 2)

For the time being, let's check the weight of each model.

>>> l1.W.data
array([[-0.2187428 ,  0.51174778,  0.30037731, -1.08665013],
       [ 0.65367842,  0.23128517,  0.25591806, -1.0708735 ],
       [-0.85425782,  0.25255874,  0.23436508,  0.3276397 ]], dtype=float32)

>>> l1.b.data
array([ 0.,  0.,  0.], dtype=float32)

>>> l2.W.data
array([[-0.18273738, -0.64931035, -0.20702939],
       [ 0.26091203,  0.88469893, -0.76247424]], dtype=float32)

>>> l2.b.data
array([ 0.,  0.], dtype=float32)

The structure of each model is defined above. Next, we will clarify the overall structure, such as how the models that define those structures are connected.

>>> x = chainer.Variable(np.array([[1, 2, 3, 4], [5, 6, 7, 8]], dtype=np.float32))

>>> x.data
array([[ 1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.]], dtype=float32)

>>> h = l1(x)

>>> y = l2(h)

>>> y.data
array([[ 1.69596863, -4.08097076],
       [ 1.90756595, -4.22696018]], dtype=float32)

To make them reusable, the official documentation recommends creating classes as follows:

MyChain.py


# -*- coding: utf-8 -*-
from chainer import Chain
import chainer.links as L

class MyChain(Chain):

    def __init__(self):
        super(MyChain, self).__init__(
            l1 = L.Linear(4, 3),
            l2 = L.Linear(3, 2) )

    def __call__(self, x):
        h = self.l1(x)
        return self.l2(h)

Optimizer

Next, we will optimize the weights of the Neural Network model. There are several methods for optimizing this weight, but honestly, there seems to be no clear criterion for which one to use, so here we use the Stochastic Gradient Descent (SGD) method. To do. The difference in performance depending on the optimization method is explained in Which optimization method shows the best performance for learning CNN. I have received it.

>>> model = MyChain()
>>> optimizer = optimizers.SGD()  #Designate optimization method as SGD
>>> optimizer.setup(model)
>>> optimizer
<chainer.optimizers.sgd.SGD object at 0x10b7b40d0>

At this time, you can pass the parameter information of the model by optimizer.setup (model) .

The official documentation states that there are two ways to optimize. In the first method, the gradient value is calculated manually, and the manual calculation of the gradient is quite difficult. So, except in special cases, use another method, such as calculating the gradient automatically. If you want it to be calculated automatically, you need to define a loss function in advance.

Details will be introduced next time in "Introduction to Deep Learning (2) --Let's try non-linear regression with Chainer-", but each one will introduce the loss function by themselves. Define. When dealing with real numbers, it can be defined as a problem that minimizes the sum of the two norms of the least squares method, and it seems that it is often defined as a problem that minimizes cross entropy. For the loss function, various types are explained in Notes on Backpropagation Method.

Loss function


def forward(x, y, model):
    loss = ... #Define your own loss function
    return loss

This time it is assumed that the forward function that calculates the loss function takes the arguments x, y, and model```. If you define such a loss function, the parameters will be optimized as follows.

optimizer.update(forward, x, y, model)

reference

    1. Official Chainer Reference There were various things written in Japanese, but I often encountered parts that could not be dealt with due to version changes, etc., so this was the most stable in English.
  1. Which optimization method shows the best performance for learning CNN
    1. Notes on error back propagation method
  2. Introduction to Deep Learning (2) --Try Nonlinear Regression with Chainer-

bonus

We are waiting for you to follow us! Qiita: Carat Yoshizaki twitter:@carat_yoshizaki Hatena Blog: Carat COO Blog Home page: Carat

Tutor service "Kikagaku" where you can learn machine learning one-on-one Please feel free to contact us if you are interested in "Kikagaku" where you can learn "Mathematics-> Programming-> Web Applications" at once.

Recommended Posts

Introduction to Deep Learning (1) --Chainer is explained in an easy-to-understand manner for beginners-
[For beginners] I want to explain the number of learning times in an easy-to-understand manner.
[For beginners] Introduction to vectorization in machine learning
[Explanation for beginners] Introduction to convolution processing (explained in TensorFlow)
[Explanation for beginners] Introduction to pooling processing (explained in TensorFlow)
An introduction to OpenCV for machine learning
[Deep Learning from scratch] I tried to explain the gradient confirmation in an easy-to-understand manner.
An introduction to Python for machine learning
I tried to understand supervised learning of machine learning in an easy-to-understand manner even for server engineers 1
[Python] I tried to explain words that are difficult for beginners to understand in an easy-to-understand manner.
I tried to understand supervised learning of machine learning in an easy-to-understand manner even for server engineers 2
An introduction to machine learning for bot developers
An introduction to object-oriented programming for beginners by beginners
[For beginners] After all, what is written in Deep Learning made from scratch?
An introduction to machine learning
Introduction to Deep Learning ~ Learning Rules ~
Deep Reinforcement Learning 1 Introduction to Reinforcement Learning
Introduction to Deep Learning ~ Backpropagation ~
[Machine learning] Let's summarize random forest in an easy-to-understand manner
Introduction to Deep Learning for the first time (Chainer) Japanese character recognition Chapter 1 [Environment construction]
An introduction to Mercurial for non-engineers
Introduction to Deep Learning ~ Function Approximation ~
Introduction to Deep Learning ~ Coding Preparation ~
Introduction to Deep Learning ~ Dropout Edition ~
Introduction to Deep Learning ~ Forward Propagation ~
Introduction to Deep Learning ~ CNN Experiment ~
An introduction to Python for non-engineers
Check if the configuration file is read in an easy-to-understand manner
Python learning memo for machine learning by Chainer Chapter 8 Introduction to Numpy
Python learning memo for machine learning by Chainer Chapter 10 Introduction to Cupy
Introduction to Deep Learning (2) --Try your own nonlinear regression with Chainer-
I will explain how to use Pandas in an easy-to-understand manner.
Python learning memo for machine learning by Chainer Chapter 9 Introduction to scikit-learn
Introduction to Deep Learning for the first time (Chainer) Japanese character recognition Chapter 3 [Character recognition using a model]
How to study for the Deep Learning Association G test (for beginners) [2020 version]
[Python] I tried to summarize the set type (set) in an easy-to-understand manner.
Introduction to Deep Learning ~ Convolution and Pooling ~
I tried to summarize Cpaw Level1 & Level2 Write Up in an easy-to-understand manner
Beginners read "Introduction to TensorFlow 2.0 for Experts"
I tried to summarize Cpaw Level 3 Write Up in an easy-to-understand manner
[Deep learning] Nogizaka face detection ~ For beginners ~
Introduction to Deep Learning for the first time (Chainer) Japanese character recognition Chapter 2 [Model generation by machine learning]
Introduction to Graph Database Neo4j in Python for Beginners (for Mac OS X)
An introduction to Python for C programmers
Python learning notes for machine learning with Chainer Chapters 11 and 12 Introduction to Pandas Matplotlib
[What is an algorithm? Introduction to Search Algorithm] ~ Python ~
For beginners to build an Anaconda environment. (Memo)
An amateur tried Deep Learning using Caffe (Introduction)
Recommended study order for machine learning / deep learning beginners
An introduction to Cython that doesn't go deep
An introduction to statistical modeling for data analysis
Try to calculate RPN in Python (for beginners)
Introduction to Deep Learning ~ Localization and Loss Function ~
Introduction to Programming (Python) TA Tendency for beginners
An introduction to voice analysis for music apps
View logs in an easy-to-understand manner with Ansible
[Introduction for beginners] Working with MySQL in Python
I installed Chainer, a framework for deep learning
An introduction to Cython that doesn't go deep -2-
Introduction to Deep Learning for the first time (Chainer) Japanese character recognition Chapter 4 [Improvement of recognition accuracy by expanding data]
[For beginners] How to use say command in python!