from keras import layers, models
model = models.Sequential()
model.add(layers.Conv2D(32,(3,3),activation="relu",input_shape=(150,150,3)))
If you search for "keras Conv2D", you will find "2D convolutional layer". So what is a "two-dimensional convolution layer"? There is also the term "one-dimensional convolutional neural network". Therefore, as a premise to understand "What is the difference between 1D and 2D?" It is necessary to understand "convolutional neural network" and "convolution".
Convolutional Neural Network.
Convolutional: Convolutional Neural Network: Neural network
So CNN is a "convolutional neural network".
https://www.atmarkit.co.jp/ait/articles/1804/23/news138.html According to:
Image files such as jpg have a fixed number of pixels for each of width and height. For example, suppose you have a photo with width: 300px and height: 200px. If one pixel is represented by ■ (square) The photo is an array of 300 x 200 = 60,000 ■. So, if width: 5px and height: 5px and there are a total of 25 ■, it will be as shown in the figure below.
Furthermore, in the case of black-and-white photographs
Then, in the case of "drawing x (x) with black characters on a white background", it will be as shown in the figure below.
Similarly, if it is a plus sign (+),
And if it is a minus sign (-),
And if it is an equal sign (=),
Is.
X with black letters on a white background
What happens if you "examine the features by focusing on small divisions" for the image data? For example, pay attention to the part surrounded by the red frame and the blue frame.
This area is all
Is. In other words, it can be seen that "the red frame part and the blue frame part have the same characteristics". here,
"Data showing features (feature detectors)" such as
(Sometimes called a filter. The meaning is the same). In other words, if you want to understand the characteristics of the "5 x 5" original image, The original image should be subdivided and each should be compared to the "2 x 2" kernel. This is the idea of "determining an image" or "identifying the characteristics of an image or its difference from other images".
In order to understand Conv2D, it is necessary to understand "two-dimensional convolution layer". To do so, we first need to understand the "convolution layer". So what is "convolution"?
Roughly speaking, it is as follows.
If you want to perform a convolution on a 5x5 original image with a 3x3 kernel Shift by 1 square (this is called "stride (number of pixels to shift) is 1") Then, a total of 9 matrix calculations will be performed. Therefore, if the calculation results are output and arranged, it will be 9 times, that is, "feature map is 9 squares".
The red frame is the object to be compared with the kernel, that is, the "area of interest (called a window)". The matrix operation is repeated by shifting by 1 square (1 pixel) from the upper left to the lower right of the original image. In this case, since the calculation is performed 9 times, the feature map becomes 9 squares (3 x 3). It is called "stride is 1" to calculate by shifting one pixel at a time. If you calculate by shifting by 2 pixels, it is said that the stride is 2.
Let's actually try the "first matrix operation" in the above figure. The procedure for matrix calculation is as follows. Matrix operation is performed on the red frame part (window) in the left figure and the right figure (kernel).
By the way, the kernel mentioned here is just an example. In the actual convolution, "The vertical and horizontal size of the kernel can be specified arbitrarily other than 3x3." Also note that "convolution is done using multiple types of kernels, not just one type" (details will be described later).
Now, the matrix operation
As a result, the output result can be obtained. For the sake of clarity, I'll put in some numbers. here, Black -1 1 white And.
From the upper left cell to the lower right cell, the calculation is performed in order (9 times in total), as shown below.
-1 x 1 = -1 (multiply the left side of the upper row)
1 x 1 =1 (multiply the upper centers)
1 x 1 =1 (multiply the right side of the upper row)
1 x -1 = -1 (multiply the left side of the middle row)
-1 x -1 =1 (multiply the centers in the middle row)
1 x -1 = -1 (multiply the right side of the middle row)
1 x 1 =1 (multiply the left side of the bottom row)
1 x 1 =1 (multiply the lower centers)
-1 x 1 = -1 (multiply the lower right sides)
The left side is the "value of one cell in a part of the original image", The right side is the "value of one cell in the kernel". And, "add all" the answer,
SUM(-1, 1, 1, -1, 1, -1, 1, 1, -1)
Therefore, the result is 1. Since this 1 is arranged in the "upper left of the feature map", The feature map is as follows.
If you continue the calculation in this way, the values will be entered in the remaining 8 squares of the feature map. Performing such a calculation is "convolution". In other words, "convolution is the work of calculating the matrix of the original image and kernel and outputting the result to the feature map."
However, it is difficult to manually perform such convolution (matrix calculation). Therefore, it is calculated using a function like Keras' Conv2D.
About the sample code at the beginning.
from keras import layers, models
model = models.Sequential()
model.add(layers.Conv2D(32,(3,3),activation="relu",input_shape=(150,150,3)))
Conv2D () used in this
Conv2D(32,(3,3),activation="relu",input_shape=(150,150,3))
Investigate what the argument of is meant. You are passing four arguments.
Conv2D(
32,
(3,3),
activation="relu",
input_shape=(150,150,3)
)
official keras documentation https://keras.io/ja/layers/convolutional/#conv2d The description of is as follows.
keras.layers.Conv2D(
filters,
kernel_size,
strides=(1, 1),
padding='valid',
data_format=None,
dilation_rate=(1, 1),
activation=None,
use_bias=True,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None
)
Let's start with the first argument. The description of the official document is as follows.
filters :An integer, the dimension of the output space (that is, the number of output filters in the convolution).
In this code, we are passing 32. In other words, "the number of output filters is 32" is specified. So what is an "output filter"?
"What is a kernel?" In convolution was mentioned above. It is important to know that the "kernel" is sometimes called the "filter" here. In other words, the first argument, filters, is a "filter" and a "kernel". You can see that it is a setting value related to the kernel.
https://qastack.jp/stats/154798/difference-between-kernel-and-filter-in-cnn Then, the following questions and answers are made.
Question: What is the difference between a "kernel" and a "filter" in a convolutional neural network?
Answer: It has the same meaning. The kernel is sometimes called a filter.
Therefore, in conclusion
Will be.
If so, "the number of output filters is 32" means "the number of output kernels is 32".
5x5 input image
On the other hand, a 3x3 filter (also called a kernel)
When convolving at. If you calculate by shifting one square at a time as shown in the figure below, the calculation will be performed 9 times in total, so the answer (feature map) will be 9 squares (3x3).
(By the way, such a convolution that slides one square at a time is expressed as "stride is 1". The higher the stride value, the fewer calculations)
How many squares do you want to shift and calculate? The sloppy value.
If the stride is 1
Will be.
If the stride is 2
Will be.
Then, what is the length and width of the feature map x what is the convolution under the following conditions?
The answer is 11 x 11. You can understand it by writing a grid on a spreadsheet and counting it while actually shifting it by hand. There are 25 x 25 grids. This is used as an input image. The overlapping pink frame (5x5) is the filter (kernel). Since the stride is 2, we will calculate by shifting by 2 squares. You will reach the right end in the 11th calculation. Since the vertical is the same, the feature map is 11 x 11.
Based on the above knowledge, consider the parameters required to execute the convolution. Specifically, it is necessary to answer the following questions.
There may be other questions, but the answer to these questions is to "determine the value of the argument to pass to the function."
https://child-programmer.com/ai/keras/conv2d/ Excerpt from the description of.
Conv2D(16, (3, 3)Commentary
: It means to use 16 "3x3" size filters (16 types of "3x3" filters).
It seems that odd numbers that can determine the center, such as "5x5" and "7x7", are easy to use.
It seems that the number of filters tends to be "16, 32, 64, 128, 256, 512" etc.
It seems that you should try a large number of filters for problems that seem complicated, and a small number of filters for problems that seem easy.
Here, the value related to the filter is
When
Be careful not to confuse it. The vertical and horizontal sizes are as explained so far. In the example below, the vertical and horizontal size of the filter is "5 x 5" (the pink area is a 5x5 = 25 pixel square).
So what does "the number of filters (how many filters do you use? That number)" mean? There is more than one type of filter for convolution. "One type" only indicates "one feature". For example, if you have a 3x3 filter, the filter types are, for example,
And so on. This is the "type of filter" and the "number of filters", that is, the "number of filters".
Summary,
Conv2D(16, (3, 3)
"Fold using 16 (16 types) filters with 3x3 vertical and horizontal pixels." It is an instruction.
If you want to know more about the meaning of "convolution using multiple filters, for example 16 types (16 sheets)" https://products.sint.co.jp/aisia/blog/vol1-16 See "Convolutional layer" in. The following is an excerpt.
"The number of feature maps output as many as the number of filters" means After convolution with 16 types (16 sheets) of filters, It means that 16 "feature maps" are output.
Here for the sake of simplicity Consider the case of "convolution is performed with three filters".
For example, in the figure below, the filter (pink area) is 2x2. The feature map (green area) is 3x3.
If there is only one type of filter (pink area), Only one feature map (green area) is output.
However, if you prepare three types of filters, Because each type performs matrix calculation Since each feature map has different results, three feature maps are output.
Sample code at the beginning
from keras import layers, models
model = models.Sequential()
model.add(layers.Conv2D(32,(3,3),activation="relu",input_shape=(150,150,3)))
Then
Conv2D(32,(3,3)
it is written like this. This is an instruction to "convolve using 32 types (32 sheets) of 3x3 filters (kernels)".
Above,
I understand how to decide the answer to (how to pass arguments).
Continue to
Consider.
https://child-programmer.com/ai/keras/conv2d/ The following is an excerpt from.
input_shape=(28, 28, 1)Commentary
: A gray scale (black and white image) of 28 pixels vertically and 28 pixels horizontally is input.
In other words, in the sample code at the beginning
input_shape=(150,150,3)
If "The vertical and horizontal pixels of the input image are 150 x 150" Will be. So what does 3 mean?
Official documentation https://keras.io/ja/layers/convolutional/#conv2d To
Input for RGB images_shape=(128, 128, 3)It becomes.
a.
1 for black and white images RGB 3
Therefore, it is considered to be the number of colors (3 types of red, green, and blue for RGB). If it is a normal photo (.jpg), it is RGB, so if you set 3, there will be no problem.
Sample code
model.add(layers.Conv2D(32,(3,3),activation="relu",input_shape=(150,150,3)))
Written in
activation="relu"
What does
https://child-programmer.com/ai/keras/conv2d/ The explanation in is below.
activation=Explanation of relu
: Activation function "ReLU (Rectified Linear Unit)"-Ramp function ".
Performed on the filtered image. Output is 0 when the input is 0 or less. If the input is larger than 0, it is output as it is.
https://keras.io/ja/layers/convolutional/#conv2d The explanation in is below.
activation:Name of activation function to use (see activations)
If nothing is specified, no activation will be applied
In other words activation="relu" Is the command "use ReLU as the activation function".
The function for activating is the "activation function". So what is "activation"? Below is a collection of contexts for understanding activation.
Summary, "If you specify an activation function, the expressiveness of the model will increase (you can create a smart AI), so let's specify an activation function." And "ReLU is used as standard, isn't it?"
But this is
strides = 1
Specify as. Detail is https://keras.io/ja/layers/convolutional/#conv2d See.
As mentioned above
model.add(layers.Conv2D(32,(3,3),activation="relu",input_shape=(150,150,3)))
What are you doing? What does each argument mean? I could roughly understand. Because the purpose of this chapter is "Understanding Keras Conv2D (2D Convolutional Layer)" Once here. We will investigate Sequential () and MaxPooling2D () in a separate chapter.
Recommended Posts