Deep Learning Experienced in Python Chapter 2 (Materials for Journals)

Used in Do2dle 9th Study Session (1st Deep Learning Experience with Python) This is a material for deep learning 2 chapters to experience with Python. I am making a supplement, but I am trying to understand that part.

Development began in 1980 by Guido van Rossum. Python is a simple and consistent programming language. Readability and consistency are important in the Python community.

2.2 Python2 or Python3

Python2 series and Python3 series are not compatible. If you want to start Python anew, you can use Python 3 series. However, the Python2 code remains. In the sense of heritage of the past, the Python2 system cannot be ignored. In this book, it is described on the premise of Python2 system.

(Beginning of comments by the presenter)

Python2 series

The last version (2.7) of Python2 series has been discontinued. There will be no major corrections. However, support for 2.7 has been extended until 2020, It seems that the situation where Python 2 series and Python 3 series are mixed will continue.

Python3 series

The Python 3 series was released in 2008. Currently, the Python 3 series is being actively developed. It can be said that the major libraries are compatible with Python 3 series and there is almost no problem.

(End of comment by presenter)

2.4 Coding style

It is worth leaving a consistent coding style and documentation to improve readability and ease of maintenance.

PEP8 etc. are famous for coding style.

(Beginning of comments by the presenter)

Become the de facto standard. Let's read PEP8.

PEP8 Japanese translation http://pep8-ja.readthedocs.io/ja/latest/

Use PEP8 Check Tool It can be installed with pip install pep8. Let's check automatically using the check tool.

(End of comment by presenter)

2.5 Python and NumPy abbreviation

2.5.2 Container

Containers such as lists, dictionaries, sets, and tuples are implemented as standard in Python. When combining multiple items into one, the list is in brackets [], the dictionary and set are in brace brackets {}, and the tuple is in regular brackets. The distinction between a dictionary and a set is a dictionary if each item is a key and the attribute value pair is composed of a colon ;.

(Beginning of comments by the presenter)

List: Array values can be changed (mutable)
Dictionary: Maps and hashes in other languages
Set: Set A collection of unordered elements with no duplicate elements
Tuple: The value cannot be changed (immutable) Tuples are applied when the order of variables and reality is meaningful. If either a list or a tuple is acceptable, it is better to use a tuple.

(End of comment by presenter)

2.5.5 NumPy NumPy is the core library of scientific computing in Python. It consists of a multidimensional array object and array processing tools. As will be described later, flexible array manipulation is indispensable for processing convolutional neural networks.

(Beginning of comments by the presenter)

Why use NumPy?

Numerical calculation requires a lot of processing, but you can speed up the execution time by using NumPy. (Calculating using a for loop is slower than using a Python list, so it is better to close it in NumPy as much as possible.)
It is possible to calculate between arrays of different sizes (broadcast) This makes it possible to easily perform complicated calculations.

[5] Broadcast

Broadcast is a mechanism that enables calculations between arrays of different sizes.

For example, in the following example 1, by adding 1, it will be automatically expanded and treated as np.array ([1, 2]) + np.array ([1, 1]).

#Example 1
>>> import numpy as np
>>> np.array([1, 2]) + 1
array([2, 3])

In Example 2 below, by adding 5, it is automatically expanded np.array ([[1, 2], [3, 4]]) + np.array ([[5, 5], [5,, It is treated as if 5]]) was performed.

#Example 2
>>> import numpy as np
>>> np.array([[1, 2], [3, 4]]) + 5
array([[6, 7],
       [8, 9]])

In Example 3 below, by adding np.array ([2, 3]), it will be automatically expanded np.array ([[1, 2], [3, 4]]) + np.array ([ It is treated as if [2, 3], [2, 3]]) was performed.

#Example 3
>>> import numpy as np
>>> np.array([[1, 2], [3, 4]]) + np.array([2, 3])
array([[3, 5],
       [5, 7]])

(End of comment by presenter)

2.6 Deep learning framework with Python

2.6.1 Caffe Caffe is a deep learning processing system written in C ++ and CUDA. Available from Python, C ++ and MATLAB.

Caffe provides trained models, and the Caffe community shares the trained parameter files with the zoo model (Model Zoo). Learning models such as AlexNet, GoogLeNet, VGG, SPP are available. You can test the results of deep learning by using the learned caffe model (the file name extension of the zoo model is .caffe model).

[1] Operation procedure

Collect data.
Calculate and save the average of the data
Classify the images.
Watch the learning process as needed.

2.6.2 Theano This section outlines the key points in understanding Theano's code. Packages built on Theano include Blocks, deepy, Keras, Lasagne, Nolearn, Pylearn2.

(Beginning of comments by the presenter)

Theano is a library for numerical calculation in Python. It provides functions for performing matrix operations, etc., and can be thought of as an alternative package to numpy / scipy.

The big feature is

Run-time C ++ code generation and compilation
GPU support
Analytical differentiation support (x ^ 2 differentiation = 2x is automatically performed)

This results in benchmarks running Multilayer Perceptron that> Theano is 1.8 times faster than numpy and 1.6 times faster than Matlab.> (Of course, the speed difference is case by case). Quoted from http://d.hatena.ne.jp/saket/20121207/1354867911

The reading is Theano.

(End of comment by presenter)

[1] Theano processing procedure

Define the coupling coefficient matrix as a shared variable of Theano.
Describe the conversion function of each layer.
Define the gradient with the automatic differentiation function.
Express the dependency of each variable with thano.function.

(Beginning of comments by the presenter)

[2] Tensor variable

The tensor variable defines the variable type by combining the rank (dimension) and the element type.

See below for type details

http://deeplearning.net/software/theano/library/tensor/basic.html#libdoc-basic-tensor

(End of comment by presenter)

The number of types of tensor variables is defined up to the 4th order. Considering image processing, one image is a three-dimensional array (vertical and horizontal pixel two-dimensional image and color channel cubic). Multiple training images are required for image recognition. Considering the number of images used in this case as one dimension, a four-dimensional array is required. Interlayer processing in deep learning is equivalent to converting a four-dimensional array in each layer to obtain final recognition. Therefore, it can be said that the flow of each layer in deep learning is to process the tensor variable for each layer. For this reason, Google probably named its machine learning package TensorFlow.

[3] Function theano.function

Theano.function, the function in Theano, is different from the function in the classical language. Frequently used functions are also available in Theano. These functions and theano.function are different. thano.function refers to an operation that compiles the relationship between the input variable specified by the first argument and the output variable specified by the second argument from the defined variables and graph connected relationships.

When theano.function is evaluated, there is a slight delay in the execution of theano.function. This is because the instruction of theano.function is converted into C code and compiled.

[4] Automatic differentiation theano.grad

(Beginning of comments by the presenter)

One of Theano's main features is this differentiation function. You can "analyze the formula to find the differentiated formula" called automatic differentiation. For example

x, y = T.dscalars("x", "y") #How to write to declare all together
z = (x+2*y)**2

If we differentiate the equation with respect to x, we get dz / dx = 2 (x + 2 * y).

gx = T.grad(z, x)

You can convert the expression with. For the derivative with respect to y, dz / dy = 4 (x + 2 * y), but that gy = T.grad(z, y) You can convert the expression with. When actually finding the value, it still needs to be functionalized,

fgy = theano.function([x,y], gy)
>>> fgy(1,2)
array(20.0)

And so on. Quoted from http://qiita.com/mokemokechicken/items/3fbf6af714c1f66f99e9

(End of comment by presenter)

2.6.3 Chainer It can be seen that Chainer has excellent readability of the source code and the ability of the creator is high. The manuals and tutorials are also concise and clear.

(Beginning of comments by the presenter)

Why now a new framework?

Caffe, Theano / Pylearn2, and Torch7 are three popular deep learning frameworks. These are developed with the basic goal of writing a feedforward network. However, with the recent progress of deep learning, there is an increasing need to be able to flexibly write more complex networks. Therefore, many new frameworks are being sought based on Theano, which has a particularly high degree of freedom (eg Blocks, Keras, Lasagne, deepy, etc.).

Most of the existing implementations, including these frameworks, take the approach of once expanding the structure of the entire neural network in memory, looking at the processing in order, and performing forward and back propagation as it is. I have. It's like implementing an interpreter for your own mini-language. For example, in Caffe, the schema defined in Protocol Buffer corresponds to the mini-language. In the case of Torch7, a special module called a container acts as a control structure. Theano allows for more flexible definitions, but uses a special feature called scan to write loops. In this approach, if you want to support more complicated calculation flow, you basically need to expand this mini-language, and the learning cost and writing cost will increase. Basically, considering that the structure of neural networks will become more complicated in the future, this development is not preferable.

Chainer takes a different approach. It's based on Python, but doesn't use Theano. All control structures can be used as they are in Python. Chainer only remembers what processing was actually applied to the input array using Python code and uses it to perform error backpropagation. We believe this approach is necessary to keep the speed of research and development of deep learning becoming more complex, and that is why we embarked on the development of a new framework.

Quoted from https://research.preferred.jp/2015/06/deep-learning-chainer/

(End of comment by presenter)

[1] Overview of Chainer processing

Define Chain using Link.
Set Chain to Optimizer
Define the forward function
Read the dataset and separate it for training and evaluation.
Turn the training loop.
Run the evaluation loop at a reasonable frequency.

Therefore, it is considered that the learning of the standard neural network is implemented obediently.

There is a variable Chainer.Variable dedicated to Chainer. In comparison with Theano

>>> import chainer
>>> import theano
>>> import numpy as np
>>> X = np.ndarray((3, 5))
>>> x_chainer = chainer.Variable(X)
>>> x_theano = theano.tensor.dmatrix()

>>> print(x_chainer.label)
(3, 5), float32
>>> print(x_theano.type)
TensorType(float3, matrix)

The above example is for comparison, and it is not practical to use Chainer and Theano at the same time.

[3] Link and Chain

(Beginning of comments by the presenter)

Many neural network structures contain multiple links. For example, a multi-layer perceptron consists of multiple linear layers. By combining multiple Links, this complex procedure can be described by parameters.

l1 = L.Linear(4, 3)
l2 = L.Linear(3, 2)
def my_forward(x):
    h = l1(x)
    return l2(h)

Where L is the chainer.links module. If the processing procedure is defined by parameters in this way, it is difficult to reuse. A more Python-like way is to group Links and steps into classes:

 class MyProc(object):
     def __init__(self):
         self.l1 = L.Linear(4, 3)
         self.l2 = L.Linear(3, 2)

     def forward(self, x):
         h = self.l1(x)
         return self.l2(h)

To make it easier to reuse, we want features such as parameter management, CPU / GPU migration support, robust and flexible save / load, and more. All of these features are supported by Chainer's Chain class. To take advantage of the features of the Chain class, simply define the above class as a subclass of the Chain class:

 class MyChain(Chain):
     def __init__(self):
         super(MyChain, self).__init__(
             l1=L.Linear(4, 3),
             l2=L.Linear(3, 2),
         )

     def __call__(self, x):
         h = self.l1(x)
         return self.l2(h)

Shows if more complex connections are built with simpler Links. Links like l1 and l2 are called as MyChain's child Links. Chain itself inherits Link. That means you can define more complex connections with the MyChain object as a child Link.

The Optimizer class optimizes for good parameter values. Optimizer runs the numerical optimization algorithm given to Link. Many algorithms are implemented in the optimizers module. Let's use the simplest one here, the stochastic gradient descent method:

 model = MyChain()
 optimizer = optimizers.SGD()
 optimizer.setup(model)

Quoted from http://www.iandprogram.net/entry/chainer_japanese

(End of comment by presenter)

2.6.5 Tensorflow The points are session management (checkpoints, etc.) and tensorboard.

In TensorFlow, the storage range that the neural network should estimate is called a variable (tensorflow.Variable), and the storage range that does not need to be estimated is called a placeholder (tensorflow.placeholder). Therefore, the input image is a placeholder. On the other hand, the coupling coefficient (matrix) and bias (vector) are variables. In Theano's manual, placeholders and variables were used without distinction. Defining and managing Variables is common with Theano and Chainer.

2.6.7 scikit-learn Unlike the other frameworks covered in this book, scikit-learn is a Python implementation of machine assessment. It seems that scikit-learn also has a multi-layer perceptron. However, the introduction of the multi-layer perceptron is scikit-learn 0.18. It is not included in the stable version 0.17 as of February 2016.

(Beginning of comments by the presenter)

0.18.0 was released in September 2016 and includes a multi-layer perceptron.

http://scikit-learn.org/stable/modules/neural_networks_supervised.html#multilayer-perceptron

(End of comment by presenter)

Scikit-learn makes it easy to compare with existing machine learning algorithms, not just neural networks.

(Beginning of comments by the presenter)

What I want to say above is that scikit-learn has an interface that is arranged regardless of the algorithm as shown in the code below. This is what scikit-learn did, which means that the following description can be easily made and the algorithms can be easily compared.

for name, estimator in ESTIMATORS.items():
    estimator.fit(X_train, y_train)
    y_test_predict[name] = estimator.predict(X_test)

Quoted from http://scikit-learn.org/stable/auto_examples/plot_multioutput_face_completion.html (End of comment by presenter)

You need to make your own fit (X, y) and predict (X) to compare your own algorithm with the algorithms available from scikit-learn.

reference Deep Learning Experience with Python Chapter 1 (Materials for Journals) http://qiita.com/taki_tflare/items/c1bfd976155d89104e3d