Yes, let's do deep learning. This is a commentary article to send to you. Even so, I am writing while studying myself, so there may be errors or misunderstandings. If you find one, please contact us.

What is Deep learning?

This slide is very well organized.

Deep learning

After all, I think that the characteristic of deep learning is that it "extracts the characteristics". For example, if you want to build a model to judge sumo wrestling, you usually define features such as "waist size", "presence or absence of mage", and "whether or not you are wearing kimono" and build a model based on that. It's just like deciding the argument of a function.

However, in Deep learning, this feature extraction is also done by the model. Or rather, that's why it's multi-layered, or Deep. Specifically, let's learn by extracting features in parallel and multi-layered, such as features around the head, features around the waist, and features of the upper body that combine them, and let's make a judgment with it. That is the basic idea (below, image).

As shown in the figure, Deep learning can divide the technical elements that make up it as follows.

Data split How to divide the data used for judgment. In the case of image judgment, how many pixels should the image be divided by how many pixels?
Feature extraction
Model used for feature extraction The model used to extract the features. Typical examples are Autoencoder and restricted Boltzmann machine. Each of these models is usually pre-trained (Pre-training).
Propagation method of extracted features To which node and how to convey the extracted features. For example, if all the extracted features are transmitted to the next node, the error will also propagate, so adjust that.
Judgment: Make a final judgment from the extracted features Actually, this is no longer a neural network, but a general classifier such as SVM may be used.

The following will explain each element.

Data split

The data to be analyzed, such as data and voice, contains various characteristics. The most ideal is to divide the data in the form of 1 feature, but it is difficult to do it without knowing what the feature is, so it is important to divide the data so as not to lose the feature as much as possible. Become.

For example, if the most important factor in sumo wrestling judgment is the presence or absence of "mage", it is very important that this "mage" fits in the divided image. This is because if the mage is on the border of the divided image, it will not be possible to judge.

A common approach to this is to cover (overlap) the data to some extent rather than simply splitting it. The unit that divides an image is called a window, and the image is that this window is gradually shifted. In terms of character strings, if the data is ABCDE, the window size is 3, and the movement width is 1, the data will be taken as "ABC", "BCD", and "CDE".

This is not limited to deep learning.

Extraction of features

The biggest feature of deep learning is that it can extract the features themselves. So how is that done?

Model used for feature extraction

The basic idea is an Autoencoder. The following is an easy-to-understand explanation of Autoencoder. It is an image that creates a model with the same input and output and trains it so that the input can be reproduced with limited features.

Hello Autoencoder

Along with this, the Restricted Boltzmann Machine is often used. Roughly speaking, this is a probability model that estimates the probability distribution that causes the input data to occur. The explanation is easy to understand here.

2012-12-12 Yurufuwa Restricted Boltzmann Machine

Both are essentially the same in that they "search for parameters that can explain the given data well" (Reference 1. -machine-different-from-an-Autoencoder-Arent-both-of-them-learning-high-level-features-in-an-unsupervised-manner-Similarly-how-is-a-Deep-Belief-Network-different -from-a-network-of-stacked-Autoencoders), Reference 2) ..

In Deep learning, these are individually trained (Pre training) and combined. The first layer is for each person, and the second layer is for learning after fixing the parameters for the first layer. The image is similar to a unit test using stubs. This pre-learning overcomes the weakness of multi-layer neural networks, where error propagation does not work well as the layers get deeper (it propagates from the upper layers, making it difficult to reach the lower layers).

Propagation method of extracted features

The next question is how to reach the next layer. Here, as mentioned above, there is a trade-off between information propagation and error propagation. (In the following, each learning model will be referred to as a node).

Output value (Activation function)

First, determine how to calculate the value to propagate to the next layer (node). Sigmoid functions are often used like general neural networks, but it seems that Rectifierd Linear Units are also used these days. In addition, maxout and Lp Pooling are also used. These determine the value by combining multiple outputs, maxout is the maximum value as its name suggests, Lp Pooling is ... It outputs a nice value that can not be said in a word. Please refer to Lp pooling for this, as it is very easy to understand. Thin edges of an image have fewer nodes that can detect them, making it difficult for information to propagate, but these are techniques to prevent such minute information from being leaked (roughly speaking).

Propagation method

How to set the propagation destination is also a consideration. The easiest way is to propagate it to all the nodes in the next layer, like a neural network. However, this naturally causes error propagation and overfitting may occur. As a method to deal with this, methods such as a local receptive field that limits the connection node and Dropout that determines the node that does not propagate randomly have been devised.

By deciding these and connecting each node, the "pre-learned feature extraction model" (node) is finally assembled in layers. The final judgment will be made using the features extracted by this.

Judgment

The final judgment will be made, and the learning for this final judgment is called Fine training. Naturally, this is a supervised learning. You can build it with a neural network until the end and adjust it by the error propagation method, or attach SVM on top and train it.

The above is the explanation of the mechanism of deep learning.

Use deep learning

As some of you may have noticed, deep learning isn't really a new technology, it's a good combination of existing neural networks. Therefore, if you want to implement it, you can do it by yourself ... but it's really hard, so I think it's better to have a library.

Library

The implementation is very well organized here. Implementation Deep Learning

The typical library is as follows.

TensorFlow
Chainer
Caffe
Based on Theano
Lasagne
Keras * Works with TensorFlow base
Blocks * As of December 2015, it is still under development.
pylearn2 * Development stopped

Google has announced TensorFlow, and I feel that it will be concentrated on this. Keras supports TensorFlow, and a wrapper that can be used like scikit-learn (skflow) has appeared, and peripheral development is also active. In the image, Caffe was the previous wisdom, but Chainer has released a model trained with Caffe's model (Model Zoo) The advantage of this area may disappear, such as Import function is implemented. Caffe may not be easy to install (many issues are related to installation) and peripheral conditions such as non-Python 3 support, so migration can proceed as the model import function and distribution of trained models in other libraries progress. I think the sex is high.

In addition, the Python3 support that you are interested in is as follows.

Supported from TensorFlow: 0.6.0 (Add support for Python 3.x). Please note that TensorFlow does not work on Windows (Currently supported).
Chainer: Supported from 1.1.0
Caffe: No plans to support it. However, I care, [It doesn't work](https://github.com/BVLC/ It looks like caffe / pull / 1966)
Theano: Support for single code base has been completed at least as of June 2015 (Document python3 installation)
pylearn2: [Merge Python3 compatible code] on 2014/11/15 (https://github.com/lisa-lab/pylearn2/issues/948)

I haven't looked into the others, but many libraries do. Since the issue that took the lead in TensorFlow was compatible with Python3, I think that Python2 only will not spread in the future.

If you want to implement it yourself instead of a library, many libraries are based Theano, and [Restricted Boltzmann machines] I think scikit-learn with (http://scikit-learn.org/stable/modules/neural_networks.html) is useful. yusugomori / DeepLearning is a repository that implements Deep learning in various languages, and I think this source code will also be helpful.

Training data

Where to bring the training data is also one of the problems in machine learning. It's hard to get tens of thousands of data on your own, and it's daunting to label it for supervised learning. That's why there are sites that provide learning data, so let's thank you for using it.

However, if you want to make use of it in an application, regardless of research, you often have to prepare the data yourself, and here, steady and muddy work occurs.
MNIST
This is sample data of handwritten characters.
CIFAR-10
A collection of images labeled in 10 classes (airplane, automobile, etc.). There is also a more detailed labeled CIFAR-100.
The Oxford-IIIT Pet Dataset
Labeled data, similar to CIFAR-10. As the name implies, it is an animal system.
kaggle
Data analysis competition site. You can compete for the accuracy of the model and the data is also provided. You can now search and publish your data at Kaggle Datasets.
Microsoft Azure Marketplace
Various data such as NFL match results and demographics are provided (some are paid, but many are free).
UC Irvine Machine Learning Repository
A site that collects datasets for machine learning.
Harvard Dataverse
A dataset published by Harvard University. It is published using software called Dataverse, and you can see that this site also publishes data using this software.
COCO
Data set of image recognition and captions for images published by Microsoft (this is used in Neuraltalk2 that adds captions to images)
nico-opendata
We provide data such as comments on Nico Nico still images and videos for researchers.
DATA GO JP
An open data catalog site published by the Japanese government. Here also collects the data released by each local government.
Data.gov
Original American open data catalog site
ShapeNet
Data set of 3D model. There are various types of models, from furniture to airplanes.
DCASE
Learning / evaluation data is available for tasks that classify natural sounds (park sounds, office sounds, etc.).

In addition, various data are introduced in Dataset of the Deep Learning information site. The image system is well summarized in Resource --List of Interesting Very Large Datasets of Images.

Natural language systems often use scraping, but keep ethical issues in mind (Web scraping precautions. c5e827e1827e7cb29011)).

Practice

This time I will use pylearn2 which is a major one. In the case of Windows, as is customary, the installation is hell gate, so please refer to here (I thought for a moment that I could not get out of this hell this time).

Now that pylearn2 is ready, let's move the tutorial without greed. In Quick-start is a model called grbm_smd, which is a Restricted Boltzmann machine that makes up Deep learning. It corresponds to Machine).

grbm stands for RBM with Gaussian units. In RBM, the value that a node can take is usually a binary value of 0 or 1, but this is not expressive, so it is a model extended to a linear function (Gaussian unit) with independent variance. See here for more information (A Practical Guide to Training Restricted Boltzmann Machines section 13.2). This document also provides a good summary of other techniques for working with RBM.
smd means denoising score matching, which is a method of intentionally missing (noise) input data and using how well it was restored as an index (= objective function) of the model. (Denoising autoencoder). By incorporating these layers of roles into deep learning, you can improve your responsiveness, for example, when handwritten characters are faint in the middle.

Below is a diagram of the actual model. In pylearn2, the normal model is set in a YAML format file (in this tutorial, cifar_grbm_smd.yaml cifar_grbm_smd.yaml)), please take a look there as well.

Let's actually try it. The procedure is described in README, so you can trace it.

Download CIFAR-10 data Set the environment variable PYLEARN2_DATA_PATH that represents the directory that stores the data used by pylearn2 in advance, and then from the Git shell pylearn2/scripts/datasets/download_cifar10.sh
To execute.
Run make_dataset.py Go to the pylearn2 / scripts / tutorials / grbm_smd folder and run the data creation script (don't forget to enable the virtual environment). python make_dataset.py
Run train.py Now that the data is ready, let's learn (actually, I put pylearn2 / scripts in the path, but it is troublesome, so I am executing it with a relative path). ../../train.py　cifar_grbm_smd.yaml
Confirmation of execution result Now that you have a trained model, let's take a look at its contents. Matplotlib is used, but it didn't work unless I installed it with conda (such as Tcl / Tck GUI display and around PIL). ../../show_weights.py cifar_grbm_smd.pkl

It's a mystery what this should be, but at least it reflects the size of the weight, and considering that the defects are randomly entered, the colors and darkness shown are difficult to restore / Does it mean that it is a place? (Because weights should not be needed unless completion is needed). In addition, you can easily check the situation where the error rate decreases due to learning with the following command (specify b, L, M as the plot target). ../../plot_monitor.py cifar_grbm_smd.pkl

That's all for Quick Start. It included some difficult content, but I think that at least the concept of the model and the execution environment have been obtained by this point. Deep learning is literally a deep world, but I hope this article will help you understand it.

Reference material

Deep learning Hello Autoencoder 2012-12-12 Yurufuwa Restricted Boltzmann Machine Lp pooling Neural network counterattack Deep Learning Technology Now Implementation Deep Learning Cat and image processing part 3 – Cat breed identification with deep learning Convolutional Neural Network Convolution Neural Network for speech recognition Building High-level Features Using Large Scale Unsupervised Learning

[PYTHON] Start Deep learning