[PYTHON] I investigated Keras's Conv2D (2D convolutional layer)

Thing you want to do

from keras import layers, models
model = models.Sequential()
model.add(layers.Conv2D(32,(3,3),activation="relu",input_shape=(150,150,3)))

What you can read and understand this article

What is Conv2D?

If you search for "keras Conv2D", you will find "2D convolutional layer". So what is a "two-dimensional convolution layer"? There is also the term "one-dimensional convolutional neural network". Therefore, as a premise to understand "What is the difference between 1D and 2D?" It is necessary to understand "convolutional neural network" and "convolution".

What is CNN?

Convolutional Neural Network.

Convolutional: Convolutional Neural Network: Neural network

So CNN is a "convolutional neural network".

Reference information to deepen your understanding of CNN

https://www.atmarkit.co.jp/ait/articles/1804/23/news138.html According to:

basic way of thinking

What is an "image" in the first place?

Image files such as jpg have a fixed number of pixels for each of width and height. For example, suppose you have a photo with width: 300px and height: 200px. If one pixel is represented by ■ (square) The photo is an array of 300 x 200 = 60,000 ■. So, if width: 5px and height: 5px and there are a total of 25 ■, it will be as shown in the figure below.

image.png

Furthermore, in the case of black-and-white photographs

Then, in the case of "drawing x (x) with black characters on a white background", it will be as shown in the figure below.

image.png

Similarly, if it is a plus sign (+),

image.png

And if it is a minus sign (-),

image.png

And if it is an equal sign (=),

image.png

Is.

The idea of "focusing on small divisions and examining their characteristics"

X with black letters on a white background

image.png

What happens if you "examine the features by focusing on small divisions" for the image data? For example, pay attention to the part surrounded by the red frame and the blue frame.

image.png

This area is all

image.png

Is. In other words, it can be seen that "the red frame part and the blue frame part have the same characteristics". here,

image.png

"Data showing features (feature detectors)" such as

kernel

(Sometimes called a filter. The meaning is the same). In other words, if you want to understand the characteristics of the "5 x 5" original image, The original image should be subdivided and each should be compared to the "2 x 2" kernel. This is the idea of "determining an image" or "identifying the characteristics of an image or its difference from other images".

What is "convolution"?

In order to understand Conv2D, it is necessary to understand "two-dimensional convolution layer". To do so, we first need to understand the "convolution layer". So what is "convolution"?

Roughly speaking, it is as follows.

The output result (feature map) of "convolution" with the original image of 5 x 5 and the kernel (filter) of 3 x 3 is 9 squares (3 x 3).

If you want to perform a convolution on a 5x5 original image with a 3x3 kernel Shift by 1 square (this is called "stride (number of pixels to shift) is 1") Then, a total of 9 matrix calculations will be performed. Therefore, if the calculation results are output and arranged, it will be 9 times, that is, "feature map is 9 squares".

image.png

The red frame is the object to be compared with the kernel, that is, the "area of interest (called a window)". The matrix operation is repeated by shifting by 1 square (1 pixel) from the upper left to the lower right of the original image. In this case, since the calculation is performed 9 times, the feature map becomes 9 squares (3 x 3). It is called "stride is 1" to calculate by shifting one pixel at a time. If you calculate by shifting by 2 pixels, it is said that the stride is 2.

Specific calculation example

Let's actually try the "first matrix operation" in the above figure. The procedure for matrix calculation is as follows. Matrix operation is performed on the red frame part (window) in the left figure and the right figure (kernel).

image.png

By the way, the kernel mentioned here is just an example. In the actual convolution, "The vertical and horizontal size of the kernel can be specified arbitrarily other than 3x3." Also note that "convolution is done using multiple types of kernels, not just one type" (details will be described later).

Now, the matrix operation

As a result, the output result can be obtained. For the sake of clarity, I'll put in some numbers. here, Black -1 1 white And.

image.png

From the upper left cell to the lower right cell, the calculation is performed in order (9 times in total), as shown below.

-1 x  1 = -1 (multiply the left side of the upper row)
 1 x  1 =1 (multiply the upper centers)
 1 x  1 =1 (multiply the right side of the upper row)
 1 x -1 = -1 (multiply the left side of the middle row)
-1 x -1 =1 (multiply the centers in the middle row)
 1 x -1 = -1 (multiply the right side of the middle row)
 1 x  1 =1 (multiply the left side of the bottom row)
 1 x  1 =1 (multiply the lower centers)
-1 x  1 = -1 (multiply the lower right sides)

The left side is the "value of one cell in a part of the original image", The right side is the "value of one cell in the kernel". And, "add all" the answer,

SUM(-1, 1, 1, -1, 1, -1, 1, 1, -1)

Therefore, the result is 1. Since this 1 is arranged in the "upper left of the feature map", The feature map is as follows.

image.png

If you continue the calculation in this way, the values will be entered in the remaining 8 squares of the feature map. Performing such a calculation is "convolution". In other words, "convolution is the work of calculating the matrix of the original image and kernel and outputting the result to the feature map."

However, it is difficult to manually perform such convolution (matrix calculation). Therefore, it is calculated using a function like Keras' Conv2D.

The meaning of the arguments passed to the keras function Conv2D ()

About the sample code at the beginning.

from keras import layers, models
model = models.Sequential()
model.add(layers.Conv2D(32,(3,3),activation="relu",input_shape=(150,150,3)))

Conv2D () used in this

Conv2D(32,(3,3),activation="relu",input_shape=(150,150,3))

Investigate what the argument of is meant. You are passing four arguments.

Conv2D(
  32,
  (3,3),
  activation="relu",
  input_shape=(150,150,3)
)

official keras documentation https://keras.io/ja/layers/convolutional/#conv2d The description of is as follows.

keras.layers.Conv2D(
  filters,
  kernel_size,
  strides=(1, 1),
  padding='valid',
  data_format=None,
  dilation_rate=(1, 1),
  activation=None,
  use_bias=True,
  kernel_initializer='glorot_uniform',
  bias_initializer='zeros',
  kernel_regularizer=None,
  bias_regularizer=None,
  activity_regularizer=None,
  kernel_constraint=None,
  bias_constraint=None
)

Let's start with the first argument. The description of the official document is as follows.

filters :An integer, the dimension of the output space (that is, the number of output filters in the convolution).

In this code, we are passing 32. In other words, "the number of output filters is 32" is specified. So what is an "output filter"?

What is a "filter" in the first place?

"What is a kernel?" In convolution was mentioned above. It is important to know that the "kernel" is sometimes called the "filter" here. In other words, the first argument, filters, is a "filter" and a "kernel". You can see that it is a setting value related to the kernel.

https://qastack.jp/stats/154798/difference-between-kernel-and-filter-in-cnn Then, the following questions and answers are made.

Therefore, in conclusion

Will be.

If so, "the number of output filters is 32" means "the number of output kernels is 32".

Review of convolution

5x5 input image

image.png

On the other hand, a 3x3 filter (also called a kernel)

image.png

When convolving at. If you calculate by shifting one square at a time as shown in the figure below, the calculation will be performed 9 times in total, so the answer (feature map) will be 9 squares (3x3).

(By the way, such a convolution that slides one square at a time is expressed as "stride is 1". The higher the stride value, the fewer calculations)

animated_convolution.gif

What is a stride?

How many squares do you want to shift and calculate? The sloppy value.

If the stride is 1

stride1.gif

Will be.

If the stride is 2

stride2.gif

Will be.

Then, what is the length and width of the feature map x what is the convolution under the following conditions?

The answer is 11 x 11. You can understand it by writing a grid on a spreadsheet and counting it while actually shifting it by hand. There are 25 x 25 grids. This is used as an input image. The overlapping pink frame (5x5) is the filter (kernel). Since the stride is 2, we will calculate by shifting by 2 squares. You will reach the right end in the 11th calculation. Since the vertical is the same, the feature map is 11 x 11.

image.png

How to decide the arguments to pass to the Conv2D function

Based on the above knowledge, consider the parameters required to execute the convolution. Specifically, it is necessary to answer the following questions.

There may be other questions, but the answer to these questions is to "determine the value of the argument to pass to the function."

How to determine the vertical and horizontal size of the filter (kernel)

https://child-programmer.com/ai/keras/conv2d/ Excerpt from the description of.

Conv2D(16, (3, 3)Commentary
: It means to use 16 "3x3" size filters (16 types of "3x3" filters).
It seems that odd numbers that can determine the center, such as "5x5" and "7x7", are easy to use.
It seems that the number of filters tends to be "16, 32, 64, 128, 256, 512" etc.
It seems that you should try a large number of filters for problems that seem complicated, and a small number of filters for problems that seem easy.

Here, the value related to the filter is

When

Be careful not to confuse it. The vertical and horizontal sizes are as explained so far. In the example below, the vertical and horizontal size of the filter is "5 x 5" (the pink area is a 5x5 = 25 pixel square).

image.png

So what does "the number of filters (how many filters do you use? That number)" mean? There is more than one type of filter for convolution. "One type" only indicates "one feature". For example, if you have a 3x3 filter, the filter types are, for example,

image.png

And so on. This is the "type of filter" and the "number of filters", that is, the "number of filters".

Summary,

Conv2D(16, (3, 3)

"Fold using 16 (16 types) filters with 3x3 vertical and horizontal pixels." It is an instruction.

Supplement on "number of filters"

If you want to know more about the meaning of "convolution using multiple filters, for example 16 types (16 sheets)" https://products.sint.co.jp/aisia/blog/vol1-16 See "Convolutional layer" in. The following is an excerpt.

"The number of feature maps output as many as the number of filters" means After convolution with 16 types (16 sheets) of filters, It means that 16 "feature maps" are output.

Here for the sake of simplicity Consider the case of "convolution is performed with three filters".

For example, in the figure below, the filter (pink area) is 2x2. The feature map (green area) is 3x3.

stride1.gif

If there is only one type of filter (pink area), Only one feature map (green area) is output.

However, if you prepare three types of filters, Because each type performs matrix calculation Since each feature map has different results, three feature maps are output.

image.png

Take a look at the sample code at the beginning

Sample code at the beginning

from keras import layers, models
model = models.Sequential()
model.add(layers.Conv2D(32,(3,3),activation="relu",input_shape=(150,150,3)))

Then

Conv2D(32,(3,3)

it is written like this. This is an instruction to "convolve using 32 types (32 sheets) of 3x3 filters (kernels)".

Above,

I understand how to decide the answer to (how to pass arguments).

Continue to

Consider.

What is input_shape?

https://child-programmer.com/ai/keras/conv2d/ The following is an excerpt from.

input_shape=(28, 28, 1)Commentary
: A gray scale (black and white image) of 28 pixels vertically and 28 pixels horizontally is input.

In other words, in the sample code at the beginning

input_shape=(150,150,3)

If "The vertical and horizontal pixels of the input image are 150 x 150" Will be. So what does 3 mean?

Official documentation https://keras.io/ja/layers/convolutional/#conv2d To

Input for RGB images_shape=(128, 128, 3)It becomes.

a.

1 for black and white images RGB 3

Therefore, it is considered to be the number of colors (3 types of red, green, and blue for RGB). If it is a normal photo (.jpg), it is RGB, so if you set 3, there will be no problem.

What is activation?

Sample code

model.add(layers.Conv2D(32,(3,3),activation="relu",input_shape=(150,150,3)))

Written in

activation="relu"

What does

https://child-programmer.com/ai/keras/conv2d/ The explanation in is below.

activation=Explanation of relu
: Activation function "ReLU (Rectified Linear Unit)"-Ramp function ".
Performed on the filtered image. Output is 0 when the input is 0 or less. If the input is larger than 0, it is output as it is.

https://keras.io/ja/layers/convolutional/#conv2d The explanation in is below.

activation:Name of activation function to use (see activations)
If nothing is specified, no activation will be applied

In other words activation="relu" Is the command "use ReLU as the activation function".

What is activation?

The function for activating is the "activation function". So what is "activation"? Below is a collection of contexts for understanding activation.

Summary, "If you specify an activation function, the expressiveness of the model will increase (you can create a smart AI), so let's specify an activation function." And "ReLU is used as standard, isn't it?"

About stride designation

But this is

strides = 1

Specify as. Detail is https://keras.io/ja/layers/convolutional/#conv2d See.

Summary

As mentioned above

model.add(layers.Conv2D(32,(3,3),activation="relu",input_shape=(150,150,3)))

What are you doing? What does each argument mean? I could roughly understand. Because the purpose of this chapter is "Understanding Keras Conv2D (2D Convolutional Layer)" Once here. We will investigate Sequential () and MaxPooling2D () in a separate chapter.

Recommended Posts

I investigated Keras's Conv2D (2D convolutional layer)