[PYTHON] [Deep learning] Investigating how to use each function of the convolutional neural network [DW day 3]

What is Convolutional Neural Network (CNN)?

A neural network specialized for images. Whereas a normal multi-layer perceptron consists of an input layer, an intermediate layer, and an output layer, CNN also has a convolution layer, a pooling layer, and a locally normalized layer (LRN layer).

Looking at AlexNet (top of ILSVRC2012) in Chainer Example, it looks like this:

# … (Omitted)
class AlexBN(chainer.Chain):

   """Single-GPU AlexNet with LRN layers replaced by BatchNormalization."""

   insize = 227

   def __init__(self):
       super(AlexBN, self).__init__(
           conv1=L.Convolution2D(3,  96, 11, stride=4),
           bn1=L.BatchNormalization(96),
           conv2=L.Convolution2D(96, 256,  5, pad=2),
           bn2=L.BatchNormalization(256),
           conv3=L.Convolution2D(256, 384,  3, pad=1),
           conv4=L.Convolution2D(384, 384,  3, pad=1),
           conv5=L.Convolution2D(384, 256,  3, pad=1),
           fc6=L.Linear(9216, 4096),
           fc7=L.Linear(4096, 4096),
           fc8=L.Linear(4096, 1000),
       )
       self.train = True

# … (Omitted)

It consists of 5 convolution layers and 3 fully connected layers. The paper also states that the activation function ReLu, multi-GPU, LRN, and pooling are important. The Chainer Example above uses batch normalization instead of LRN. The following is detailed about batch normalization.

-Batch Normalization mechanism and its intuitive understanding

Convolution layer

L.Convolution2D(3,  96, 11, stride=4)

The calculation formula for convolution is written in various reference books, so I will omit it. Here, we aim to be able to use this function. First, convolution is to filter and convert an image. Like images, filters also have variables such as the number of pixels and channels. Assuming that the number of pixels of the image is $ N \ times N $ and the number of channels is $ K $, the size of the filter is also $ H \ times H , just as the size of the image is written as $ N \ times N \ times K $. Write like times K $. The number of image and filter channels will be the same. There may be multiple types of filters applied to the image. With the $ M $ type filter, the number of channels in the output image is converted to $ M $. Also, when applying a filter, the image is partially applied while moving the filter, and the width of the movement is called the stride width. Increasing the stride width makes it easier to miss image features, so a smaller stride width is desirable. Further, providing virtual pixels outside the edge of the image is called padding. By giving padding, it is possible to suppress the reduction of the image when it is folded. If you want it to be the same size as the input, truncate the padding size to $ H / 2 $. To summarize so far, the argument of pythonL.Convolution2D () is

--First argument = number of input channels. That's $ K $. --Second argument = number of output channels. $ M $. --Third argument = filter size. About $ H $ --Argument name stride = stride width. (The smaller the better. Is it a balance with the input image size?) --Argument name pad = padding width. (Often rounded down to the nearest $ H / 2 $)

Will be. There is no specification regarding the number of pixels in the input / output image.

Batch normalization

L.BatchNormalization(96)

The argument is the number of image channels to be normalized. It will be the same as the number of output channels of the previous convolution ($ M $).

Pooling layer

 h = self.bn1(self.conv1(x), test=not self.train)
 h = F.max_pooling_2d(F.relu(h), 3, stride=2)

After inputting the result of the convolution layer into ReLu, pooling is performed. Pooling pays attention to a certain area like a filter and outputs one representative value of that area according to a certain rule. This makes it possible to obtain position invariance. There are many variations of pooling.

--Mean pooling: Take the average of the values in the area --Maximum pooling: Take the maximum in the area

AlexNet uses maximum pooling.

To summarize the arguments of F.max_pooling_2d (),

--First argument: Input image --Second argument: Pooling area size. If the area is made too large, the accuracy will decrease. --Third argument: Stride width. Usually 2 or more.

Again, the number of input / output pixels is not specified.

I tried using AlexNet

In the next article (in writing), I will write the result of using AlexNet.

Recommended Posts

[Deep learning] Investigating how to use each function of the convolutional neural network [DW day 3]
[Deep learning] Image classification with convolutional neural network [DW day 4]
How to use the zip function
How to use machine learning for work? 01_ Understand the purpose of machine learning
I tried how to improve the accuracy of my own Neural Network
[Deep Learning from scratch] Initial value of neural network weight using sigmoid function
How to install the deep learning framework Tensorflow 1.0 in the Anaconda environment of Windows
Basics of PyTorch (2) -How to make a neural network-
How to hit the document of Magic Function (Line Magic)
[Deep Learning from scratch] Initial value of neural network weight when using Relu function
[Part 4] Use Deep Learning to forecast the weather from weather images
Try to build a deep learning / neural network with scratch
[Part 3] Use Deep Learning to forecast the weather from weather images
How to use the generator
Chapter 3 Neural Network Cut out only the good points of deep learning made from scratch
How to run the Export function of GCP Datastore automatically
How to increase the number of machine learning dataset images
[Part 2] Use Deep Learning to forecast the weather from weather images
How to easily draw the structure of a neural network on Google Colaboratory using "convnet-drawer"
How to use the decorator
[Deep Learning from scratch] About the layers required to implement backpropagation processing in a neural network
[EC2] How to install chrome and the contents of each command
[Python] Explains how to use the format function with an example
How to use Jupyter on the front end of supercomputer ITO
How to use the render function defined in .mako (.html) directly in mako
[NNabla] How to remove the middle tier of a pre-built network
Understand the number of input / output parameters of a convolutional neural network
How to use the optparse module
Summary of how to use pandas.DataFrame.loc
Introduction to Deep Learning ~ Function Approximation ~
Summary of how to use pyenv-virtualenv
Touch the object of the neural network
How to use python zip function
Thoroughly study Deep Learning [DW Day 0]
Summary of how to use csvkit
How to use the ConfigParser module
I tried to understand the learning function in the neural network carefully without using the machine learning library (second half).
[Python] How to make the file name of the output data unique (use year, month, day, hour, second)
How to study for the Deep Learning Association G test (for beginners) [2020 version]
Learning record (4th day) #How to get the absolute path from the relative path
[Python] Explains how to use the range function with a concrete example
How to use machine learning for work? 02_Overview of AI development project
How to use the library "torchdiffeq" that implements Neural ODE's ODE Block
[Python] How to use the enumerate function (extract the index number and element)
How to create a wrapper that preserves the signature of the function to wrap
[C language] How to use the crypt function on Linux [Password hashing]
I tried the common story of using Deep Learning to predict the Nikkei 225
[Python] Summary of how to use pandas
How to use the Spark ML pipeline
How to use cybozu.com developer network (Part 2)
scikit-learn How to use summary (machine learning)
How to check the version of Django
Othello-From the tic-tac-toe of "Implementation Deep Learning" (3)
Implementation of 3-layer neural network (no learning)
[python] How to use __command__, function explanation
[Linux] How to use the echo command
How to calculate Use% of df command
How to study deep learning G test
How to use the Linux grep command
Visualize the effects of deep learning / regularization
[Python2.7] Summary of how to use unittest