Introduction

I'm fumio, a beginner in machine learning. I am devoted to the fun of machine learning programming and learning every day.

I am learning "Learn from mosaic removal: cutting-edge deep learning" written by koshian2. In order to deepen my understanding of what I learned, I would like to summarize an example of applying a convolutional neural network (CNN) to image discrimination. https://qiita.com/koshian2/items/aefbe4b26a7a235b5a5e

The main points are as follows.

Understand the structure of convolutional neural networks. Create a convolutional neural network model with a total of 10 layers of + layer convolution + 1 layer Dense.
The model determines the CIFAR-10 dataset.

Convolutional neural network structure

A convolutional neural network (CNN) is a forward propagation network that includes two types of layers, a convolutional layer and a pooling layer, and is applied to image recognition.

A forward propagation neural network does not have a looping connection to the network, and the signal propagates only in one direction, such as input layer → intermediate layer → output layer. Recursive neural networks, on the other hand, are the opposite and reversible.

Data set loading

`cifar10.ipynb`



import matplotlib.pyplot as plt

cifar_classes = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]

(X_train, y_train),(X_test,y_test) = tf.keras.datasets.cifar10.load_data()
print(X_train.shape,y_train.shape)
print(X_test.shape,y_test.shape)

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz 170500096/170498071 [==============================] - 13s 0us/step (50000, 32, 32, 3) (50000, 1) (10000, 32, 32, 3) (10000, 1)

Read directly from the Keras dataset. If you check the dimensions of the training data, you can see that it is 50,000 32 x 32 x 3 data. Since it is a color image, it is three-dimensional.

`cifar10.ipynb`



fig = plt.figure(figsize=(14,14))
for i in range(100):
  ax = plt.subplot(10,10,i+1)
  ax.imshow(X_train[i])
  ax.axis('off')
  ax.set_title(cifar_classes[y_train[i,0]])

The image looks like this. It's already blurry from the beginning, but somehow I can understand the meaning of each name and photo. However, you can see that some types are difficult to distinguish (deer and horse, automobile and truck, etc.).

What is pooling

Pooling on CNN refers to compressing and downsampling information. It is usually applied as a pooling layer after the convolutional layer. The main effects are as follows.

Can handle minute position changes
Overfitting can be suppressed to some extent
The calculation cost can be reduced

The output in the pooling layer can be made constant even if the position of the feature corresponding to the position change in 1. is slightly deviated. In other words, taking handwritten numbers as an example, even if the numbers are slightly misaligned, they can be recognized as the same numbers.

Create a 10-layer neural network model

In order to make this CIFAR-10 discrimination, we will make a model of 9 layers of convolution + 1 layer of fully connected layers, for a total of 10 layers.

Convolution (Conv) 64ch-> Batch Normalization (BN)-> Activation (ReLU)) repeat x3-> Pooling
(Conv 128ch -> BN -> ReLU) repeat x3 -> Pooling
(Conv 256ch -> BN -> ReLU) repeat x3
Global Average Pooling -> Dense 10 -> Softmax

Make models in this order. ReLU is used as the activation function.

`cifar10.ipynb`



inputs = layers.Input((32,32,3))
x = inputs

for ch in [64, 128, 256]:
    for i in range(3):
        x = layers.Conv2D(ch, 3, padding="same")(x)
        x = layers.BatchNormalization()(x)
        x = layers.ReLU()(x)
    if ch != 256:
        x = layers.AveragePooling2D()(x)
        
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(10, activation="softmax")(x)
model = tf.keras.models.Model(inputs, x)
model.summary()

conv2d_12 (Conv2D) (None, 8, 8, 256) 590080
batch_normalization_12 (Batc (None, 8, 8, 256) 1024
re_lu_12 (ReLU) (None, 32, 32, 256) 0
average_pooling2d_3 (Average (None, 8, 8, 256) 0
global_average_pooling2d_1 ( (None, 256) 0
dense_1 (Dense) (None, 10) 2570

Only the last part of the output was extracted. The dimensions change as follows. (None,32,32,3)→(None,32,32,64)→(None, 16, 16, 128) →(None, 8, 8, 256)→(None, 256)→(None, 10) You can see that the dimension is halved when passing through the pooling layer.

Data set preparation

`cifar10.ipynb`



X_train = X_train.astype(np.float32) / 255.0
X_test = X_test.astype(np.float32) / 255.0
y_train = y_train.astype(np.float32)
y_test = y_test.astype(np.float32)

Next, since the original data is unit8 type and scale [0,255], convert the data type to float32 and the scale to [0,1].

Model learning

`cifar10.ipynb`



model.fit(X_train,y_train, validation_data=(X_test, y_test),epochs=10)

It may take a long time depending on the PC specs (my PC specs took about 10 minutes for each epoch). Therefore, we recommend that you proceed with the help of Google Colab. 　 The image below is the predicted result and the correct answer. What is written in red is incorrect. I took epoch only 10 times, so the wrong answer rate was about 38%.

The full program is available here. https://github.com/Fumio-eisan/cifar10_20200308

[PYTHON] I made an image discrimination (cifar10) model using a convolutional neural network.

Introduction

Convolutional neural network structure

Data set loading

cifar10.ipynb

cifar10.ipynb

What is pooling

Create a 10-layer neural network model

cifar10.ipynb

Data set preparation

cifar10.ipynb

Model learning

cifar10.ipynb

`cifar10.ipynb`

`cifar10.ipynb`

`cifar10.ipynb`

`cifar10.ipynb`

`cifar10.ipynb`