Python: Basics of image recognition using CNN

Deep learning image recognition

Image recognition

Image recognition is a technology that detects "things" and "features" such as characters and faces that appear in images and videos.

More specifically, classification of images, estimation of the position of objects (upper figure), area division of images (lower figure), etc. There are various recognition technologies.

In 2012, a team at the University of Toronto announced a study on high-precision image recognition using deep learning. Interest in deep learning has increased, and now character recognition, face recognition, autonomous driving, domestic robots, etc. It is being put to practical use in various fields.

In this post, it's called CNN (Convolutional Neural Network). I will learn the technology.

image.png

image.png

CNN

CNN overview

What is CNN (Convolutional Neural Network)?

It is a neural network that extracts features using a layer called the "convolution layer" that has a structure similar to the visual cortex of the human brain.

Compared to fully connected layer-only neural networks learned in the deep learning basics course Demonstrates higher performance in fields such as image recognition.

The convolution layer is a layer that extracts features in the same way as the fully connected layer, but unlike the fully connected layer. Because it can process image data that remains two-dimensional and extract features. It is excellent for extracting 2D features such as lines and corners.

Also, on CNN, often with a convolution layer

A layer called the "pooling layer" is used.

The pooling layer reduces the information obtained from the convolution layer Finally, the images are classified.

From the next session, we will learn about each layer, build a CNN model as shown in Fig. 2, and actually classify the images.

image.png

image.png

Convolution layer

As shown in the figure, the convolution layer is a layer that focuses on a part of the input data and examines the characteristics of that part image.

What kind of features should be focused on is determined by appropriately defining training data and loss functions. It will be learned automatically.

For example, in the case of CNN that recognizes faces, if learning progresses properly In the convolution layer close to the input layer, it is a feature of low-dimensional concepts such as lines and points. Layers closer to the output layer will focus on higher dimensional conceptual features such as eyes and nose.

(Actually, higher-order concepts such as eyes and nose are not detected directly from the original input image. It is detected based on the positional combination of low-order concepts detected in the layer close to the input layer. )

A noteworthy feature is treated internally as a weight matrix called a filter (kernel). Use one filter for each feature.

image.png

image.png

image.png

image.png

In the figure below, for an image of 9 x 9 x 3 (vertical x horizontal x number of channels (3 channels of R, G, B)) It looks like convolution is being performed with a 3 x 3 x 3 (vertical x horizontal x number of channels) filter.

I am creating a new 4x4x1 feature map (like a black and white image) using one 3x3x3 filter. In addition, use several different filters to create a total of N 4x4x1 maps. Overall, this convolution layer transforms a 9x9x3 image into a 4x4xN feature map.

(Including the following problems in this session, 2D filters are often used as an example to explain the convolution layer, but in reality, 3D filters are often used as shown in the figure below. .)

image.png

image.png

Click here for a simple implementation example Let's implement it without using Keras + TensorFlow.

import numpy as np
import matplotlib.pyplot as plt
import urllib.request

#Defines a very simple convolution layer
class Conv:
    #For a simple example, W is fixed at 3x3, not strides or padding for later sessions.
    def __init__(self, W):
        self.W = W
    def f_prop(self, X):
        out = np.zeros((X.shape[0]-2, X.shape[1]-2))
        for i in range(out.shape[0]):
            for j in range(out.shape[1]):
                x = X[i:i+3, j:j+3]
                #I'm taking the sum of the products for each element
                out[i,j] = np.dot(self.W.flatten(), x.flatten())
        return out

X = np.load('./5100_cnn_data/circle.npy')

plt.imshow(X)
plt.title("base image", fontsize=12)
plt.show()

#Please set the kernel properly
W1 = np.array([[0,1,0],
               [0,1,0],
               [0,1,0]])

W2 = np.array([[0,0,0],
               [1,1,1],
               [0,0,0]])
W3 = np.array([[1,0,0],
               [0,1,0],
               [0,0,1]])
W4 = np.array([[0,0,1],
               [0,1,0],
               [1,0,0]])

plt.subplot(1,4,1); plt.imshow(W1)
plt.subplot(1,4,2); plt.imshow(W2)
plt.subplot(1,4,3); plt.imshow(W3)
plt.subplot(1,4,4); plt.imshow(W4)
plt.suptitle("kernels", fontsize=12)
plt.show()

#Convolution
conv1 = Conv(W1); C1 = conv1.f_prop(X)
conv2 = Conv(W2); C2 = conv2.f_prop(X)
conv3 = Conv(W3); C3 = conv3.f_prop(X)
conv4 = Conv(W4); C4 = conv4.f_prop(X)

plt.subplot(1,4,1); plt.imshow(C1)
plt.subplot(1,4,2); plt.imshow(C2)
plt.subplot(1,4,3); plt.imshow(C3)
plt.subplot(1,4,4); plt.imshow(C4)
plt.suptitle("convolution results", fontsize=12)
plt.show()

image.png

Pooling layer

As shown in the figure, the pooling layer can be said to be a layer that reduces the output of the convolution layer and reduces the amount of data.

As shown in the figure

Max pooling:Take the maximum value of the subsection of the feature map
Average pooling:Take the average of the special map
Data can be compressed by doing such things.

image.png

You can examine the distribution of features in the image by performing the convolution that was dealt with in the "Convolution Layer" session. The same features are often clustered and distributed in similar locations In addition, there are times when places where features cannot be found are widely distributed. The feature map output from the convolution layer is wasteful for the size of its data.

Pooling can reduce the waste of such data and compress the data while reducing the loss of information.

On the other hand, pooling deletes detailed location information. On the contrary, this means that the features extracted by the pooling layer are not affected by the translation of the original image.

It plays a role in giving robustness.

For example, when classifying handwritten numbers in a photo, the position of the numbers is not important, Pooling removes such less important information You can build a model that is resistant to changes in the position of the object to be detected with respect to the input image.

The figure below shows a 5x5 (vertical x horizontal) feature map being pooled every 3x3 (vertical x horizontal).

image.png

Max pooling

image.png

Average pooling

Click here for a simple implementation

import numpy as np
import matplotlib.pyplot as plt
import urllib.request

#It defines a very simple convolution layer.
class Conv:
    #For a simple example, W is fixed at 3x3, not strides or padding for later sessions.
    def __init__(self, W):
        self.W = W
    def f_prop(self, X):
        out = np.zeros((X.shape[0]-2, X.shape[1]-2))
        for i in range(out.shape[0]):
            for j in range(out.shape[1]):
                x = X[i:i+3, j:j+3]
                out[i,j] = np.dot(self.W.flatten(), x.flatten())
        return out

#It defines a very simple pooling layer.
class Pool:
    #For the sake of a simple example, we won't consider strides or padding for later sessions.
    def __init__(self, l):
        self.l = l
    def f_prop(self, X):
        l = self.l
        out = np.zeros((X.shape[0]//l, X.shape[1]//l))
        for i in range(out.shape[0]):
            for j in range(out.shape[1]):
                #Do Max pooling with the code below.
                out[i,j] = np.max(X[i*l:(i+1)*l, j*l:(j+1)*l])
        return out

X = np.load('./5100_cnn_data/circle.npy')

plt.imshow(X)
plt.title("base image", fontsize=12)
plt.show()

#kernel
W1 = np.array([[0,1,0],
               [0,1,0],
               [0,1,0]])
W2 = np.array([[0,0,0],
               [1,1,1],
               [0,0,0]])
W3 = np.array([[1,0,0],
               [0,1,0],
               [0,0,1]])
W4 = np.array([[0,0,1],
               [0,1,0],
               [1,0,0]])

#Convolution
conv1 = Conv(W1); C1 = conv1.f_prop(X)
conv2 = Conv(W2); C2 = conv2.f_prop(X)
conv3 = Conv(W3); C3 = conv3.f_prop(X)
conv4 = Conv(W4); C4 = conv4.f_prop(X)

plt.subplot(1,4,1); plt.imshow(C1)
plt.subplot(1,4,2); plt.imshow(C2)
plt.subplot(1,4,3); plt.imshow(C3)
plt.subplot(1,4,4); plt.imshow(C4)
plt.suptitle("convolution images", fontsize=12)
plt.show()

#Pooling
pool = Pool(2)
P1 = pool.f_prop(C1)
P2 = pool.f_prop(C2)
P3 = pool.f_prop(C3)
P4 = pool.f_prop(C4)

plt.subplot(1,4,1); plt.imshow(P1)
plt.subplot(1,4,2); plt.imshow(P2)
plt.subplot(1,4,3); plt.imshow(P3)
plt.subplot(1,4,4); plt.imshow(P4)
plt.suptitle("pooling results", fontsize=12)
plt.show()

image.png

The bottom figure is the result of max pooling.

CNN implementation

Implement CNN using Keras + TensorFlow.

In practice, these libraries are often used to implement the model. In Keras, first create an instance that manages the model, and use the add method to define the layers layer by layer.

Create an instance.

model = Sequential()

Add layers of the model layer by layer using the add method as shown below. The fully connected layer was defined as follows.

model.add(Dense(128))

Add the convolution layer as follows: You will learn the parameters in a later session.

model.add(Conv2D(filters=64, kernel_size=(3, 3)))

Add the pooling layer as follows. You will learn the parameters in a later session.

model.add(MaxPooling2D(pool_size=(2, 2)))

Finally, it compiles and finishes generating the neural network model.

model.compile(optimizer=sgd, loss="categorical_crossentropy", metrics=["accuracy"])

The following will output a table of model structures that looks like the problem.

model.summary()

image.png

Click here for a simple example

from keras.layers import Activation, Conv2D, Dense, Flatten, MaxPooling2D
from keras.models import Sequential, load_model
from keras.utils.np_utils import to_categorical

#Model definition
model = Sequential()

#Implementation example
# --------------------------------------------------------------
model.add(Conv2D(input_shape=(28, 28, 1), filters=32, kernel_size=(2, 2), strides=(1, 1), padding="same"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding="same"))
model.add(Conv2D(filters=32, kernel_size=(2, 2), strides=(1, 1), padding="same"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(1,1)))
# --------------------------------------------------------------
model.add(Flatten())
model.add(Dense(256))
model.add(Activation('sigmoid'))
model.add(Dense(128))
model.add(Activation('sigmoid'))
model.add(Dense(10))
model.add(Activation('softmax'))

model.summary()

image.png

Classification using CNN (MNIST)

MNIST is a data set of handwritten numbers as shown in the figure below. Each image is 28 pixels x 28 pixels in size and is 1 channel (monochrome) data. Each has a class label of 0-9.

We will use CNN to classify the MNIST dataset.

image.png

Click here for an implementation example

from keras.datasets import mnist
from keras.layers import Dense, Dropout, Flatten, Activation
from keras.layers import Conv2D, MaxPooling2D
from keras.models import Sequential, load_model
from keras.utils.np_utils import to_categorical
from keras.utils.vis_utils import plot_model
import numpy as np
import matplotlib.pyplot as plt

#Data loading
(X_train, y_train), (X_test, y_test) = mnist.load_data()

#This time, we will use 300 data for training and 100 data for testing.
#The Conv layer receives a 4D array. (Batch size x length x width x number of channels)
#Since the MNIST data is originally 3D data, not RGB images, it is converted to 4D in advance.
X_train = X_train[:300].reshape(-1, 28, 28, 1)
X_test = X_test[:100].reshape(-1, 28, 28, 1)
y_train = to_categorical(y_train)[:300]
y_test = to_categorical(y_test)[:100]

#Model definition
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(3, 3),input_shape=(28,28,1)))
model.add(Activation('relu'))
model.add(Conv2D(filters=64, kernel_size=(3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10))
model.add(Activation('softmax'))


model.compile(loss='categorical_crossentropy',
              optimizer='adadelta',
              metrics=['accuracy'])

model.fit(X_train, y_train,
          batch_size=128,
          epochs=1,
          verbose=1,
          validation_data=(X_test, y_test))

#Evaluation of accuracy
scores = model.evaluate(X_test, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

#Data visualization (first 10 sheets of test data)
for i in range(10):
    plt.subplot(2, 5, i+1)
    plt.imshow(X_test[i].reshape((28,28)), 'gray')
plt.suptitle("10 images of test data",fontsize=20)
plt.show()

#Prediction (first 10 sheets of test data)
pred = np.argmax(model.predict(X_test[0:10]), axis=1)
print(pred)

model.summary()

image.png

image.png

Classification using CNN (CIFAR-10)

CIFAR-10 (Cipher Ten) is a data set of images showing 10 types of objects as shown in the picture below.

Each image is 32 pixels x 32 pixels in size and has 3 channels (R, G, B) of data. Each has a class label of 0-9. The objects corresponding to each class label are as follows.

0: Airplane 1: Car 2: Bird 3: Cat 4: Deer 5: Dog 6: Frog 7: Horse 8: Ship 9: Truck We will use CNN to classify the CIFAR-10 dataset.

image.png

import keras
from keras.datasets import cifar10
from keras.layers import Activation, Conv2D, Dense, Dropout, Flatten, MaxPooling2D
from keras.models import Sequential, load_model
from keras.utils.np_utils import to_categorical
import numpy as np
import matplotlib.pyplot as plt

#Data loading
(X_train, y_train), (X_test, y_test) = cifar10.load_data()

#This time, we will use 300 of all data for training and 100 for testing.
X_train = X_train[:300]
X_test = X_test[:100]
y_train = to_categorical(y_train)[:300]
y_test = to_categorical(y_test)[:100]


#Model definition
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same',
                 input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10))
model.add(Activation('softmax'))


#compile
opt = keras.optimizers.rmsprop(lr=0.0001, decay=1e-6)
model.compile(loss='categorical_crossentropy',
              optimizer=opt,
              metrics=['accuracy'])

#It takes a few minutes to learn, so load the weights obtained by training in advance.
#model.load_weights('./cnn_data/param_cifar10.hdf5')

#Learning
model.fit(X_train, y_train, batch_size=32, epochs=1)

#Use the following to save weights. It cannot be executed here.
# model.save_weights('param_cifar10.hdf5')

#Evaluation of accuracy
scores = model.evaluate(X_test, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

#Data visualization (first 10 sheets of test data)
for i in range(10):
    plt.subplot(2, 5, i+1)
    plt.imshow(X_test[i])
plt.suptitle("10 images of test data",fontsize=20)
plt.show()

#Prediction (first 10 sheets of test data)
pred = np.argmax(model.predict(X_test[0:10]), axis=1)
print(pred)

model.summary()

image.png

image.png

Hyperparameters

filters (Conv layer)

The filters parameter of the convolution layer is
Specifies the number of feature maps to generate, that is, the type of features to extract.

In the figure below Filters are 20 in the first convolution layer The filters will be 20 even in the second convolution layer.

image.png

If the filters are too small to extract the required features, you will not be able to proceed with learning well. On the other hand, if it is too large, it will be easy to overfit, so be careful.

Let's implement it without using Keras + TensorFlow.

import numpy as np
import matplotlib.pyplot as plt

#It defines a very simple convolution layer.
#Only 1-channel image convolution is assumed.
#For a simple example, the kernel is fixed at 3x3, not strides or padding.
class Conv:
    def __init__(self, filters):
        self.filters = filters
        self.W = np.random.rand(filters,3,3)
    def f_prop(self, X):
        out = np.zeros((filters, X.shape[0]-2, X.shape[1]-2))
        for k in range(self.filters):
            for i in range(out[0].shape[0]):
                for j in range(out[0].shape[1]):
                    x = X[i:i+3, j:j+3]
                    out[k, i, j] = np.dot(self.W[k].flatten(), x.flatten())
        return out

X = np.load('./5100_cnn_data/circle.npy')

filters=10

#Generation of convolutional layer
conv = Conv(filters=filters)

#Performing convolution
C = conv.f_prop(X)

# --------------------------------------------------------------
#Below is all the code for visualization.
# --------------------------------------------------------------

plt.imshow(X)
plt.title('base image', fontsize=12)
plt.show()

plt.figure(figsize=(5,2))
for i in range(filters):
    plt.subplot(2,filters/2,i+1)
    ax = plt.gca() # get current axis
    ax.tick_params(labelbottom="off", labelleft="off", bottom="off", left="off") #Delete axis
    plt.imshow(conv.W[i])
plt.suptitle('kernels', fontsize=12)
plt.show()

plt.figure(figsize=(5,2))
for i in range(filters):
    plt.subplot(2,filters/2,i+1)
    ax = plt.gca() # get current axis
    ax.tick_params(labelbottom="off", labelleft="off", bottom="off", left="off") #Delete axis
    plt.imshow(C[i])
plt.suptitle('convolution results', fontsize=12)
plt.show()

image.png

kernel_size (Conv layer)

The convolution layer kernel_size parameter is Specifies the size of the kernel (the weight matrix used for convolution).

As mentioned above, the feature map is determined by the set of products of the input data and the kernel. The figure below is a 3x3 kernel. Each element is given an arbitrary number for optimal convolution.

image.png

Also, in the figure below, kernel_size is 5x5 for the first convolution layer.

image.png

If kernel_size is too small, even very small features cannot be detected and learning cannot proceed well.

On the contrary, if it is too large, it should have been detected as a collection of small features. Even large features will be detected

Not taking advantage of the strength of neural network models, which are good at capturing hierarchical structures It will be an inefficient model.

Click here for an implementation example

import numpy as np
import matplotlib.pyplot as plt

#It defines a very simple convolution layer.
#Only 1-channel image convolution is assumed.
#I don't think about strides or padding, just to think of a simple example.
class Conv:
    def __init__(self, filters, kernel_size):
        self.filters = filters
        self.kernel_size = kernel_size
        self.W = np.random.rand(filters, kernel_size[0], kernel_size[1])
    def f_prop(self, X):
        k_h, k_w = self.kernel_size
        out = np.zeros((filters, X.shape[0]-k_h+1, X.shape[1]-k_w+1))
        for k in range(self.filters):
            for i in range(out[0].shape[0]):
                for j in range(out[0].shape[1]):
                    x = X[i:i+k_h, j:j+k_w]
                    out[k,i,j] = np.dot(self.W[k].flatten(), x.flatten())
        return out

X = np.load('./5100_cnn_data/circle.npy')

#Convolution 1
filters = 4
kernel_size = (3,3)

#Generation of convolutional layer
conv1 = Conv(filters=filters, kernel_size=kernel_size)

#Performing convolution
C1 = conv1.f_prop(X)

#Convolution 2
filters = 4
kernel_size = (6,6)

#Generation of convolutional layer
conv2 = Conv(filters=filters, kernel_size=kernel_size)

#Performing convolution
C2 = conv2.f_prop(X)

#Below is all the code for visualization

plt.imshow(X)
plt.title('base image', fontsize=12)
plt.show()

plt.figure(figsize=(10,1))
for i in range(filters):
    plt.subplot(1,filters,i+1)
    ax = plt.gca() # get current axis
    ax.tick_params(labelbottom="off", labelleft="off", bottom="off", left="off") #Delete axis
    plt.imshow(conv1.W[i])
plt.suptitle('kernel visualization', fontsize=12)
plt.show()

plt.figure(figsize=(10,1))
for i in range(filters):
    plt.subplot(1,filters,i+1)
    ax = plt.gca() # get current axis
    ax.tick_params(labelbottom="off", labelleft="off", bottom="off", left="off") #Delete axis
    plt.imshow(C1[i])
plt.suptitle('convolution results 1', fontsize=12)
plt.show()

plt.figure(figsize=(10,1))
for i in range(filters):
    plt.subplot(1,filters,i+1)
    ax = plt.gca() # get current axis
    ax.tick_params(labelbottom="off", labelleft="off", bottom="off", left="off") #Delete axis
    plt.imshow(conv2.W[i])
plt.suptitle('kernel visualization', fontsize=12)
plt.show()

plt.figure(figsize=(10,1))
for i in range(filters):
    plt.subplot(1,filters,i+1)
    ax = plt.gca() # get current axis
    ax.tick_params(labelbottom="off", labelleft="off", bottom="off", left="off") #Delete axis
    plt.imshow(C2[i])
plt.suptitle('convolution results 2', fontsize=12)
plt.show()

image.png

strides (Conv layer)

The strides parameter of the convolution layer is the interval at which features are extracted. In other words, specify the distance to move the kernel.

strides=(1,1)

image.png

strides=(2,2)

image.png

The smaller the strides, the finer the features can be extracted. For example, the same feature in the same place in the image is detected multiple times. It seems that there are a lot of useless calculations.

However, it is generally said that smaller strides are better. In Keras Conv2D layers, strides defaults to (1,1).

import numpy as np
import matplotlib.pyplot as plt

#It defines a very simple convolution layer.
#Only 1-channel image convolution is assumed.
#I don't think about padding because I think of a simple example.
class Conv:
    def __init__(self, filters, kernel_size, strides):
        self.filters = filters
        self.kernel_size = kernel_size
        self.strides = strides
        self.W = np.random.rand(filters, kernel_size[0], kernel_size[1])
    def f_prop(self, X):
        k_h = self.kernel_size[0]
        k_w = self.kernel_size[1]
        s_h = self.strides[0]
        s_w = self.strides[1]
        out = np.zeros((filters, (X.shape[0]-k_h)//s_h+1, (X.shape[1]-k_w)//s_w+1))
        for k in range(self.filters):
            for i in range(out[0].shape[0]):
                for j in range(out[0].shape[1]):
                    x = X[i*s_h:i*s_h+k_h, j*s_w:j*s_w+k_w]
                    out[k,i,j] = np.dot(self.W[k].flatten(), x.flatten())
        return out

X = np.load('./5100_cnn_data/circle.npy')

#Convolution 1
filters = 4
kernel_size = (3,3)
strides = (1,1)

#Generation of convolutional layer
conv1 = Conv(filters=filters, kernel_size=kernel_size, strides=strides)

#Performing convolution
C1 = conv1.f_prop(X)

#Convolution 2
filters = 4
kernel_size = (3,3)
strides = (2,2)

#Generation of convolutional layer
conv2 = Conv(filters=filters, kernel_size=kernel_size, strides=strides)
conv2.W = conv1.W #Unified kernel

#Performing convolution
C2 = conv2.f_prop(X)

#Below is all the code for visualization

plt.imshow(X)
plt.title('base image', fontsize=12)
plt.show()

plt.figure(figsize=(10,1))
for i in range(filters):
    plt.subplot(1,filters,i+1)
    ax = plt.gca() # get current axis
    ax.tick_params(labelbottom="off", labelleft="off", bottom="off", left="off") #Delete axis
    plt.imshow(conv1.W[i])
plt.suptitle('kernel visualization', fontsize=12)
plt.show()

plt.figure(figsize=(10,1))
for i in range(filters):
    plt.subplot(1,filters,i+1)
    ax = plt.gca() # get current axis
    ax.tick_params(labelbottom="off", labelleft="off", bottom="off", left="off") #Delete axis
    plt.imshow(C1[i])
plt.suptitle('convolution results 1', fontsize=12)
plt.show()

plt.figure(figsize=(10,1))
for i in range(filters):
    plt.subplot(1,filters,i+1)
    ax = plt.gca() # get current axis
    ax.tick_params(labelbottom="off", labelleft="off", bottom="off", left="off") #Delete axis
    plt.imshow(conv2.W[i])
plt.suptitle('kernel results', fontsize=12)
plt.show()

plt.figure(figsize=(10,1))
for i in range(filters):
    plt.subplot(1,filters,i+1)
    ax = plt.gca() # get current axis
    ax.tick_params(labelbottom="off", labelleft="off", bottom="off", left="off") #Delete axis
    plt.imshow(C2[i])
plt.suptitle('convolution results 2', fontsize=12)
plt.show()

image.png

padding (Conv layer)

Padding is to prevent the image from shrinking when folded. Adding pixels around the input image.

In general, set the pixel to be added to 0. Fill the area around this input image with 0 This is called zero padding.

Although padding also takes into account the characteristics of the edge data. In addition, the frequency of data updates will increase There are merits such as being able to adjust the number of I / O units in each layer.

The white frame around the orange panel in the figure below represents padding. This is a figure with 1 padding up and down and 1 padding left and right. image.png

In Keras' Conv2D layer Specify the padding method such as padding = valid, padding = same.

padding=If valid, no padding is done
padding=If same, the output feature map should match the size of the input
The input is padded.

In the same code, the padding width is taken as an argument, such as padding = (1,1).

import numpy as np
import matplotlib.pyplot as plt
import urllib.request

#It defines a very simple convolution layer.
#Only 1-channel image convolution is assumed.
class Conv:
    def __init__(self, filters, kernel_size, strides, padding):
        self.filters = filters
        self.kernel_size = kernel_size
        self.strides = strides
        self.padding = padding
        self.W = np.random.rand(filters, kernel_size[0], kernel_size[1])
    def f_prop(self, X):
        k_h, k_w = self.kernel_size
        s_h, s_w = self.strides
        p_h, p_w = self.padding
        out = np.zeros((filters, (X.shape[0]+p_h*2-k_h)//s_h+1, (X.shape[1]+p_w*2-k_w)//s_w+1))
        #Padding
        X = np.pad(X, ((p_h, p_h), (p_w, p_w)), 'constant', constant_values=((0,0),(0,0)))
        self.X = X #Save it for later visualization of the padding results.
        for k in range(self.filters):
            for i in range(out[0].shape[0]):
                for j in range(out[0].shape[1]):
                    x = X[i*s_h:i*s_h+k_h, j*s_w:j*s_w+k_w]
                    out[k,i,j] = np.dot(self.W[k].flatten(), x.flatten())
        return out

X = np.load('./5100_cnn_data/circle.npy')

#Convolution 1
filters = 4
kernel_size = (3,3)
strides = (1,1)
padding = (0,0)

#Generation of convolutional layer
conv1 = Conv(filters=filters, kernel_size=kernel_size, strides=strides, padding=padding)

#Performing convolution
C1 = conv1.f_prop(X)

#Convolution 2
filters = 4
kernel_size = (3,3)
strides = (1,1)
padding = (2,2)

#Generation of convolutional layer
conv2 = Conv(filters=filters, kernel_size=kernel_size, strides=strides, padding=padding)
conv2.W = conv1.W #The weight is unified

#Performing convolution
C2 = conv2.f_prop(X)

#Below is all the code for visualization.

plt.imshow(conv1.X)
plt.title('padding results of the convolution 1', fontsize=12)
plt.show()

plt.figure(figsize=(10,1))
for i in range(filters):
    plt.subplot(1,filters,i+1)
    ax = plt.gca() # get current axis
    ax.tick_params(labelbottom="off", labelleft="off", bottom="off", left="off") #Delete axis
    plt.imshow(conv1.W[i])
plt.suptitle('kernel visualization of the convolution 1', fontsize=12)
plt.show()

plt.figure(figsize=(10,1))
for i in range(filters):
    plt.subplot(1,filters,i+1)
    ax = plt.gca() # get current axis
    ax.tick_params(labelbottom="off", labelleft="off", bottom="off", left="off") #Delete axis
    plt.imshow(C1[i])
plt.suptitle('results of the convolution 1', fontsize=12)
plt.show()

plt.imshow(conv2.X)
plt.title('padding results of the convolution 2', fontsize=12)
plt.show()

plt.figure(figsize=(10,1))
for i in range(filters):
    plt.subplot(1,filters,i+1)
    ax = plt.gca() # get current axis
    ax.tick_params(labelbottom="off", labelleft="off", bottom="off", left="off") #Delete axis
    plt.imshow(conv2.W[i])
plt.suptitle('kernel visualization of the convolution 2', fontsize=12)
plt.show()

plt.figure(figsize=(10,1))
for i in range(filters):
    plt.subplot(1,filters,i+1)
    ax = plt.gca() # get current axis
    ax.tick_params(labelbottom="off", labelleft="off", bottom="off", left="off") #Delete axis
    plt.imshow(C2[i])
plt.suptitle('results of the convolution 2', fontsize=12)
plt.show()

image.png

pool_size (Pool layer)

The pool_size parameter of the pooling layer is A parameter that specifies the size of the area to which pooling is applied at one time (pooling roughness).

In the figure below, the first pooling size is 2x2 and the second pooling size is also 2x2.

image.png

By increasing pool_size Increases robustness to position (The output does not change even if the position where the object appears in the image changes slightly) Basically, pool_size should be 2x2.

import numpy as np
import matplotlib.pyplot as plt

#It defines a very simple convolution layer.
class Conv:
    def __init__(self, W, filters, kernel_size):
        self.filters = filters
        self.kernel_size = kernel_size
        self.W = W # np.random.rand(filters, kernel_size[0], kernel_size[1])
    def f_prop(self, X):
        k_h, k_w = self.kernel_size
        out = np.zeros((filters, X.shape[0]-k_h+1, X.shape[1]-k_w+1))
        for k in range(self.filters):
            for i in range(out[0].shape[0]):
                for j in range(out[0].shape[1]):
                    x = X[i:i+k_h, j:j+k_w]
                    out[k,i,j] = np.dot(self.W[k].flatten(), x.flatten())
        return out

#It defines a very simple pooling layer.
#Only 1-channel feature map pooling is assumed.
class Pool:
    def __init__(self, pool_size):
        self.pool_size = pool_size
    def f_prop(self, X):
        k_h, k_w = self.pool_size
        out = np.zeros((X.shape[0]-k_h+1, X.shape[1]-k_w+1))
        for i in range(out.shape[0]):
            for j in range(out.shape[1]):
                out[i,j] = np.max(X[i:i+k_h, j:j+k_w])
        return out

X = np.load('./5100_cnn_data/circle.npy')

W = np.load('./5100_cnn_data/weight.npy') 

#Convolution
filters = 4
kernel_size = (3,3)
conv = Conv(W=W, filters=filters, kernel_size=kernel_size)
C = conv.f_prop(X)

#Pooling 1
pool_size = (2,2)
pool1 = Pool(pool_size)
P1 = [pool1.f_prop(C[i]) for i in range(len(C))]

#Pooling 2
pool_size = (4,4)
pool2 = Pool(pool_size)
P2 = [pool2.f_prop(C[i]) for i in range(len(C))]

#Below is all the code for visualization.

plt.imshow(X)
plt.title('base image', fontsize=12)
plt.show()

plt.figure(figsize=(10,1))
for i in range(filters):
    plt.subplot(1,filters,i+1)
    ax = plt.gca() # get current axis
    ax.tick_params(labelbottom="off", labelleft="off", bottom="off", left="off") #Delete axis
    plt.imshow(C[i])
plt.suptitle('convolution results', fontsize=12)
plt.show()

plt.figure(figsize=(10,1))
for i in range(filters):
    plt.subplot(1,filters,i+1)
    ax = plt.gca() # get current axis
    ax.tick_params(labelbottom="off", labelleft="off", bottom="off", left="off") #Delete axis
    plt.imshow(P1[i])
plt.suptitle('pooling results', fontsize=12)
plt.show()

plt.figure(figsize=(10,1))
for i in range(filters):
    plt.subplot(1,filters,i+1)
    ax = plt.gca() # get current axis
    ax.tick_params(labelbottom="off", labelleft="off", bottom="off", left="off") #Delete axis
    plt.imshow(P2[i])
plt.suptitle('pooling results', fontsize=12)
plt.show()

image.png

strides (Pool layer)

The stripes parameter of the pooling layer is Similar to the strides parameter of the convolution layer, it specifies the interval at which the feature map is pooled.

strides=(1,1)

image.png

strides=(2,2)

image.png

In Keras' Pooling layer, strides By default, it matches pool_size.

import numpy as np
import matplotlib.pyplot as plt

#It defines a very simple convolution layer.
class Conv:
    def __init__(self, W, filters, kernel_size):
        self.filters = filters
        self.kernel_size = kernel_size
        self.W = W # np.random.rand(filters, kernel_size[0], kernel_size[1])
    def f_prop(self, X):
        k_h, k_w = self.kernel_size
        out = np.zeros((filters, X.shape[0]-k_h+1, X.shape[1]-k_w+1))
        for k in range(self.filters):
            for i in range(out[0].shape[0]):
                for j in range(out[0].shape[1]):
                    x = X[i:i+k_h, j:j+k_w]
                    out[k,i,j] = np.dot(self.W[k].flatten(), x.flatten())
        return out

#It defines a very simple pooling layer.
#Only 1-channel feature map pooling is assumed.
class Pool:
    def __init__(self, pool_size, strides):
        self.pool_size = pool_size
        self.strides = strides
    def f_prop(self, X):
        k_h, k_w = self.pool_size
        s_h, s_w = self.strides
        out = np.zeros(((X.shape[0]-k_h)//s_h+1, (X.shape[1]-k_w)//s_w+1))
        for i in range(out.shape[0]):
            for j in range(out.shape[1]):
                out[i,j] = np.max(X[i*s_h:i*s_h+k_h, j*s_w:j*s_w+k_w])
        return out

X = np.load('./5100_cnn_data/circle.npy')

W = np.load('./5100_cnn_data/weight.npy')

#Convolution
filters = 4
kernel_size = (3,3)
conv = Conv(W=W, filters=filters, kernel_size=kernel_size)
C = conv.f_prop(X)

#Pooling 1
pool_size = (2,2)
strides = (1,1)
pool1 = Pool(pool_size, strides)
P1 = [pool1.f_prop(C[i]) for i in range(len(C))]

#Pooling 2
pool_size = (3,3)
strides = (2,2)
pool2 = Pool((3,3), (2,2))
P2 = [pool2.f_prop(C[i]) for i in range(len(C))]

# --------------------------------------------------------------
#Below is all the code for visualization.
# --------------------------------------------------------------

plt.imshow(X)
plt.title('base image', fontsize=12)
plt.show()

plt.figure(figsize=(10,1))
for i in range(filters):
    plt.subplot(1,filters,i+1)
    ax = plt.gca() # get current axis
    ax.tick_params(labelbottom="off", labelleft="off", bottom="off", left="off") #Delete axis
    plt.imshow(C[i])
plt.suptitle('convolution results', fontsize=12)
plt.show()

plt.figure(figsize=(10,1))
for i in range(filters):
    plt.subplot(1,filters,i+1)
    ax = plt.gca() # get current axis
    ax.tick_params(labelbottom="off", labelleft="off", bottom="off", left="off") #Delete axis
    plt.imshow(P1[i])
plt.suptitle('pooling results', fontsize=12)
plt.show()

plt.figure(figsize=(10,1))
for i in range(filters):
    plt.subplot(1,filters,i+1)
    ax = plt.gca() # get current axis
    ax.tick_params(labelbottom="off", labelleft="off", bottom="off", left="off") #Delete axis
    plt.imshow(P2[i])
plt.suptitle('pooling results', fontsize=12)
plt.show()

image.png

padding (Pool layer)

Similar to the padding of the convolution layer The padding parameter of the pooling layer specifies how to pad.

image.png

In Keras' MaxPooling2D layer Specify the padding method such as padding = valid, padding = same.

padding=If valid, no padding is done
padding=If same, the output feature map should match the size of the input
The input is padded.

The code on the right takes the padding width as an argument, such as padding = (1,1).

import numpy as np
import matplotlib.pyplot as plt

#It defines a very simple convolution layer.
class Conv:
    def __init__(self, W, filters, kernel_size):
        self.filters = filters
        self.kernel_size = kernel_size
        self.W = W # np.random.rand(filters, kernel_size[0], kernel_size[1])
    def f_prop(self, X):
        k_h, k_w = self.kernel_size
        out = np.zeros((filters, X.shape[0]-k_h+1, X.shape[1]-k_w+1))
        for k in range(self.filters):
            for i in range(out[0].shape[0]):
                for j in range(out[0].shape[1]):
                    x = X[i:i+k_h, j:j+k_w]
                    out[k,i,j] = np.dot(self.W[k].flatten(), x.flatten())
        return out

#It defines a very simple pooling layer.
#Only 1-channel feature map pooling is assumed.
class Pool:
    def __init__(self, pool_size, strides, padding):
        self.pool_size = pool_size
        self.strides = strides
        self.padding = padding
    def f_prop(self, X):
        k_h, k_w = self.pool_size
        s_h, s_w = self.strides
        p_h, p_w = self.padding
        out = np.zeros(((X.shape[0]+p_h*2-k_h)//s_h+1, (X.shape[1]+p_w*2-k_w)//s_w+1))
        X = np.pad(X, ((p_h,p_h),(p_w,p_w)), 'constant', constant_values=((0,0),(0,0)))
        for i in range(out.shape[0]):
            for j in range(out.shape[1]):
                out[i,j] = np.max(X[i*s_h:i*s_h+k_h, j*s_w:j*s_w+k_w])
        return out

X = np.load('./5100_cnn_data/circle.npy')

W = np.load('./5100_cnn_data/weight.npy')

#Convolution
filters = 4
kernel_size = (3,3)
conv = Conv(W=W, filters=filters, kernel_size=kernel_size)
C = conv.f_prop(X)

#Pooling
pool_size = (2,2)
strides = (2,2)
padding = (0,0)
pool1 = Pool(pool_size=pool_size, strides=strides, padding=padding)
P1 = [pool1.f_prop(C[i]) for i in range(len(C))]

#Pooling
pool_size = (2,2)
strides = (2,2)
padding = (1,1)
pool2 = Pool(pool_size=pool_size, strides=strides, padding=padding)
P2 = [pool2.f_prop(C[i]) for i in range(len(C))]

# --------------------------------------------------------------
#Below is all the code for visualization.
# --------------------------------------------------------------

plt.imshow(X)
plt.title('base image', fontsize=12)
plt.show()

plt.figure(figsize=(10,1))
for i in range(filters):
    plt.subplot(1,filters,i+1)
    ax = plt.gca() # get current axis
    ax.tick_params(labelbottom="off", labelleft="off", bottom="off", left="off") #Delete axis
    plt.imshow(C[i])
plt.suptitle('convolution results', fontsize=12)
plt.show()

plt.figure(figsize=(10,1))
for i in range(filters):
    plt.subplot(1,filters,i+1)
    ax = plt.gca() # get current axis
    ax.tick_params(labelbottom="off", labelleft="off", bottom="off", left="off") #Delete axis
    plt.imshow(P1[i])
plt.suptitle('pooling results', fontsize=12)
plt.show()

plt.figure(figsize=(10,1))
for i in range(filters):
    plt.subplot(1,filters,i+1)
    ax = plt.gca() # get current axis
    ax.tick_params(labelbottom="off", labelleft="off", bottom="off", left="off") #Delete axis
    plt.imshow(P2[i])
plt.suptitle('pooling results', fontsize=12)
plt.show()

image.png

Recommended Posts

Python: Basics of image recognition using CNN
Python: Application of image recognition using CNN
CNN 1 Image Recognition Basics
python: Basics of using scikit-learn ①
Application of CNN2 image recognition
Image capture of firefox using python
Image recognition of fruits using VGG16
Basics of Python ①
Basics of python ①
Basics of binarized image processing with Python
Image recognition using CNN Horses and deer
Basics of Python scraping basics
# 4 [python] Basics of functions
Basics of python: Output
Chord recognition using chromagram of python library librosa
Image processing by matrix Basics & Table of Contents-Reinventor of Python image processing-
Basics of I / O screen using tkinter in python3
Basics of Python × GIS (Part 1)
I tried handwriting recognition of runes with CNN using Keras
Basics of Python x GIS (Part 3)
Paiza Python Primer 5: Basics of Dictionaries
[Python] Using OpenCV with Python (Image Filtering)
Judgment of backlit image using OpenCV
[Python] Using OpenCV with Python (Image transformation)
Removal of haze using Python detailEnhanceFilter
Getting Started with Python Basics of Python
python x tensoflow x image face recognition
Review of the basics of Python (FizzBuzz)
Basics of Python x GIS (Part 2)
Implementation of desktop notifications using Python
Implementation of Light CNN (Python Keras)
Image recognition environment construction and basics
Handwriting recognition using KNN in Python
About the basics list of Python basics
Learn the basics of Python ① Beginners
Python basics ⑤
Python basics
Trial of voice recognition using Azure with Python (input from microphone)
Python basics ④
Python basics ③
Python basics
Python basics
Image recognition
Python basics
Python basics ③
Python basics ②
Python basics ②
[Python] Extension using inheritance of matplotlib (NavigationToolbar2TK)
Automatic collection of stock prices using python
About building GUI using TKinter of Python
Category estimation using docomo's image recognition API
(Bad) practice of using this in Python
[Learning memo] Basics of class by python
[Python3] Understand the basics of Beautiful Soup
Image recognition model using deep learning in 2016
Easy introduction of speech recognition with Python
[Python] Calculation of image similarity (Dice coefficient)
I didn't know the basics of Python
Using Azure ML Python SDK 5: Pipeline Basics
Grayscale by matrix-Reinventor of Python image processing-
The basics of running NoxPlayer in Python