[PYTHON] Go home until you understand CNN 10

Understand CNN (Convolutional Neural Network) and move on

Right now, I'm trying to distinguish 7 kinds of fruits, but I wasn't satisfied with the end feeling (I improved the accuracy) after classifying using CNN. So I will understand CNN from what it is.

What is CNN?

CNN (Convolutional Neural Network) is different from general forward propagation type neural network, it is a neural network composed of not only a fully connected layer but also a convolution layer and a pooling layer.

Hmm, I don't know. What is a general forward propagation type neural network? When I looked it up, it was very easy to understand. [Memo when reading deep learning (1. Forward propagation network)] (https://qiita.com/ma-oshita/items/99b2cf313494adbb964d)

In the convolution layer and the pooling layer, as shown in the figure below, a part of the input neuron area is narrowed down and locally associated with the next layer. Each layer is an image that has several detectors called filters.

image.png

In the image recognition example, edges are detected in the first layer, textures are detected in the next layer, and more abstract features are detected in the next layer. CNN automatically learns the parameters of the filter, which is a detector for extracting these features.

Practical example of CNN

・ Facebook tagged face detection ・ Google photo search and voice recognition

Features of CNN

There are three notable points on CNN. Convolution, Translation Invariance, and Compositionality.

What is convolution?

Convolution is a technique often used in image processing. For each element of numerical data in a grid pattern called a kernel (or filter) and numerical data of a partial image (called a window) of the same size as the kernel. It is a process of converting into one numerical value by calculating the sum of the products of. This conversion process is converted into small grid-like numerical data by performing a stride process by shifting the window little by little. It is sometimes called a sliding window because the filter is slid in this way.

image.png Quote: Standford wiki / Convolution schematic.gif

The data calculated in this way is called a feature map.

Synthetic

Once you understand each component of CNN, you will be able to combine them like a puzzle.

Each layer sequentially passes meaningful data to the next layer. As the tier progresses, the network can learn higher levels of features.

Taking fruit discrimination as an example, the first layer can detect color and shape, the second layer can combine them to detect noise, and as it gets deeper, it can detect fruit characteristics and even varieties. There may be.

Invariant

As seen in the convolution example described above, since the detection is performed from the local area through the filter, it becomes robust against the deviation of the position of the object.

That is, the feature can be detected anywhere in the input data. This is called mobile invariance.

Regarding invariance with respect to rotation and scaling, it is necessary to devise such as increasing such data by data expansion and learning.

CNN components

CNN is made up of a combination of layers, activation functions and some parameters.

Zero padding

Zero padding is the filling of 0s around the input feature map. Padding means margins, and this has the following advantages.

・ Since the number of convolutions for the edge data increases, the edge features will also be taken into consideration. -Since the number of convolution operations increases, many parameter updates are executed. ・ The size of the kernel and the number of layers can be adjusted. -Since the output size gradually decreases in the Convolution layer and the Pooling layer, the number of layers can be increased by increasing the size with zero padding.

stride

A literal translation of stride is stride. Until now, the filter was applied at 1-pixel intervals = stride is 1. If you apply it at 2 pixel intervals, the stride will be 2.

Fully Connected layer

This is the last layer of CNN for image discrimination.

The "convolution layer" and "pooling layer" that have been used so far are basically in the form of an array of input images. In order to classify the input image, it is necessary to convert from the image form to the form that can output one-dimensional output.

CNN flattens the image data into column vectors after some convolution and pooling. If it becomes flat, it can be inherited by the hidden layer and output layer.

The Fully Connected layer takes a one-dimensional vector as an input value and outputs a one-dimensional vector. In this way, CNN extracts the predominant features of the input image before passing it to the feedforward neural network, so they can be classified using the softmax method.

Pooling layer

The Pooling Layer is typically used after the convolution layer. This layer compresses the input data.

The higher the resolution, the more noise. Pooling is to reduce the resolution by reducing and resizing while retaining the features. The Pooling layer is usually applied after the Convolutoin layer. Compress and downsample the information to transform the input data into a more manageable form.

The following advantages are obtained by compression, and it works to detect features in the Convolution layer and the Pooling layer. ・ Improved robustness (less susceptible to minute positional changes) ・ Suppress overfitting to some extent ・ Calculation cost can be suppressed

Finally

By investigating and summarizing this time, I was able to understand the CNN that I casually used. I can only roughly understand the formula of Gorigori, but it seems that I can finally go home.

reference

<a target="_blank" href="https://www.amazon.co.jp/gp/product/4839970270/ref=as_li_tl?ie=UTF8&camp=247&creative=1211&creativeASIN=4839970270&linkCode=as2&tag=samuragouchim-22&linkId=8c75b95a04348b367e > Learn while moving with TensorFlow and Keras Deep learning mechanism ~ Convolutional neural network thorough explanation ~ (Compass Books series) <img src = "// ir-jp.amazon-adsystem.com/e/ir?t = samuragouchim-22 & l = am2 & o = 9 & a = 4839970270 "width =" 1 "height =" 1 "border =" 0 "alt =" "style =" border: none! Important; margin: 0px! Important; "/> ・ Click here for images  [Convolutional Neural Networks (CNNs / ConvNets)] (https://cs231n.github.io/convolutional-networks/) ・ Understanding the classic Convolutional Neural Network from scratch

Recommended Posts

Go home until you understand CNN 10
Until you install MySQL-python