[PYTHON] "Deep Learning from scratch" Self-study memo (No. 16) I tried to build SimpleConvNet with Keras

While reading "Deep Learning from scratch" (written by Yasuki Saito, published by O'Reilly Japan), I will make a note of the sites I referred to. Part 15 ← → Part 17

Since Google Colab can be used normally, I will try the contents of this book with TensorFlow.

I think I was able to cover up to the fifth chapter of the book by doing the tutorial "First Neural Network" for beginners on the TensorFlow site https://www.tensorflow.org/?hl=ja. So, I will try to build the equivalent of SimpleConvNet in Chapter 7 with keras.

Conv1D ? 2D ? 3D ? I can guess that Conv Nantoka will be used for folding, but there are types such as 1D 2D 3D, and D is probably Dimension dimensional, so image processing can be done in 2D 2D. Does that mean? If that's the case, that's fine,

But

What is one dimension? What is 3D? I am worried about that.

1D is time series data etc.

From Keras Documentation When using this layer as the first layer, input_shape (integer tuple or None. For example, for 10 128-dimensional vectors (10, 128), for any number of 128-dimensional vectors (None, None,) Please specify 128)).

about it There was such an example. Solving time series predictions using one-dimensional convolution Visualize the output of the one-dimensional convolutional layer for time series data

2D is an image etc.

When using this layer as the first layer of the model, specify the keyword argument input_shape (integer tuple, sample axis not included). For example, when data_format = "channels_last", input_shape = (128, 128, 3) for a 128x128 RGB image.

3D is a space including height

When using this layer as the first layer of the model, specify the keyword argument input_shape (integer tuple, sample axis not included). For example, in the case of data_format = "channels_last", the single channel 128x128x128 solid is input_shape = (128, 128, 128, 1).

There was such an example because the data in the direction of movement in space is also three-dimensional. Behavior classification by accelerometer

Conv2D Convolution2D Parameters filters, number of output filters in convolution kernel_size, specifies the width and height of the convolution filter. For a single integer it will be a square kernel You can specify strides = (1, 1), vertical and horizontal strides for convolution, respectively. For a single integer the width and height will be similar strides padding ='valid', specify either "valid" or "same" data_format=None, dilation_rate=(1, 1), groups=1, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, **kwargs

data_format Specify either "channels_last" (default) or "channels_first". This is the order of dimensions in the input. In the case of "channels_last", the input shape will be "(batch, height, width, channels)", and in the case of "channels_first", it will be "(batch, channels, height, width)".

If you think about it the other way around, Keras says that the default shape of the channels_last input is (batch, height, width, channels).

It should be noted that the MNIST data handled by "Deep Learning from scratch" is (batch, channels, height, width) channels_first. However, even if I specify "channels_first" for this parameter, I get an error. After all, I decided to convert the data to channels_last for processing.

For padding, refer to here → Tensorflow --padding = Difference between VALID / SAME

SimpleConvNet

Let's build SimpleConvNet with Keras as described on page 229 of the book.

Since we will be using MNIST data stored in Google Drive, we will define the drive mount and the path to the folder on the drive.

from google.colab import drive
drive.mount('/content/drive')

import sys, os
sys.path.append('/content/drive/My Drive/Colab Notebooks/deep_learning/common')
sys.path.append('/content/drive/My Drive/Colab Notebooks/deep_learning/dataset')

#TensorFlow and tf.import keras
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from keras.layers import Dense, Activation, Flatten, Conv2D, MaxPooling2D

#Import helper library
import numpy as np
import matplotlib.pyplot as plt

Read the MNIST data saved on the drive.

from mnist import load_mnist
#Data reading
(x_train, t_train), (x_test, t_test) = load_mnist(flatten=False)
x_train.shape

(60000, 1, 28, 28)

(batch, channels, height, width) channels_first format. Convert this to (batch, height, width, channels) channels_last.

X_train = x_train.transpose(0,2,3,1)
X_test = x_test.transpose(0,2,3,1)

X_train.shape

(60000, 28, 28, 1)

It became channel_last. Also, since the label t_train is an integer objective, we use sparse_categorical_crossentropy for the loss function.

"Deep Learning from scratch" P230 As shown in Figure 7-23, the network configuration is "Convolution --ReLU --Pooling --Affine --ReLU --Affine --Softmax".

I built this with Keras. Since relu is used for the activation function, he_normal is used as the initial value of the weight.

input_shape=(28,28,1)
filter_num = 30
filter_size = 5
filter_stride = 1
pool_size_h=2
pool_size_w=2
pool_stride=2
hidden_size=100
output_size=10

model = keras.Sequential(name="SimpleConvNet")
model.add(Conv2D(filter_num, filter_size, activation="relu", strides=filter_stride, kernel_initializer='he_normal', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(pool_size_h, pool_size_w),strides=pool_stride))
model.add(Flatten())
model.add(Dense(hidden_size, activation="relu", kernel_initializer='he_normal')) 
model.add(keras.layers.Dense(output_size, activation="softmax"))

#Compiling the model
model.compile(loss="sparse_categorical_crossentropy", 
              optimizer="adam", 
              metrics=["accuracy"])

model.summary()

Model: SimpleConvNet Layer 　　　　(type)　　　　　　Output Shape　　　　　Param

conv2d 　　　　(Conv2D)　　　　(None, 24, 24, 30) 　　780
max_pooling2d (MaxPooling2D)　(None, 12, 12, 30)　　　　0
flatten 　　　　(Flatten) 　　　(None, 4320)　　　　　　　0
dense 　　　　　(Dense)　　　　　(None, 100)　　　　　432100
dense_1 　　　　(Dense)　　　　　(None, 10)　　　　　　　1010

Total params: 433,890 Trainable params: 433,890 Non-trainable params: 0

Train the model.

model.fit(X_train, t_train,  epochs=5, batch_size=128)

Epoch 1/5 469/469 [==============================] - 27s 58ms/step - loss: 0.2050 - accuracy: 0.9404 Epoch 2/5 469/469 [==============================] - 27s 57ms/step - loss: 0.0614 - accuracy: 0.9819 Epoch 3/5 469/469 [==============================] - 26s 56ms/step - loss: 0.0411 - accuracy: 0.9875 Epoch 4/5 469/469 [==============================] - 27s 58ms/step - loss: 0.0315 - accuracy: 0.9903 Epoch 5/5 469/469 [==============================] - 27s 57ms/step - loss: 0.0251 - accuracy: 0.9927 tensorflow.python.keras.callbacks.History at 0x7f5167581748

It became a fairly high accuracy rate.

#Predict
predictions = model.predict(X_test)

class_names = ['0', '1', '2', '3', '4', 
               '5', '6', '7', '8', '9']

def plot_image(i, predictions_array, t_label, img):
    predictions_array = predictions_array[i]
    img = img[i].reshape((28, 28))
    true_label = t_label[i]
    plt.grid(False)
    plt.xticks([])
    plt.yticks([])

    plt.imshow(img, cmap=plt.cm.binary)

    predicted_label = np.argmax(predictions_array)
    if predicted_label == true_label:
        color = 'blue'
    else:
        color = 'red'

    plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
                                    100*np.max(predictions_array),
                                    class_names[true_label]),
                                    color=color)

def plot_value_array(i, predictions_array, t_label):
    predictions_array = predictions_array[i]
    true_label = t_label[i]
    plt.grid(False)
    plt.xticks([])
    plt.yticks([])
    thisplot = plt.bar(range(10), predictions_array, color="#777777")
    plt.ylim([0, 1]) 
    predicted_label = np.argmax(predictions_array)

    thisplot[predicted_label].set_color('red')
    thisplot[true_label].set_color('blue')

#Shows X test images, predicted labels, and correct labels.
#Correct predictions are shown in blue and wrong predictions are shown in red.
num_rows = 5
num_cols = 3
num_images = num_rows*num_cols
plt.figure(figsize=(2*2*num_cols, 2*num_rows))
for i in range(num_images):
    plt.subplot(num_rows, 2*num_cols, 2*i+1)
    plot_image(i, predictions, t_test, X_test)
    plt.subplot(num_rows, 2*num_cols, 2*i+2)
    plot_value_array(i, predictions, t_test)
plt.show()

I was able to properly identify the 9th 5 which is quite difficult.

There is also a way to write layers by stacking them individually.

model = keras.Sequential(name="SimpleConvNet")
model.add(keras.Input(shape=input_shape))
model.add(keras.layers.Convolution2D(filter_num, filter_size, strides=filter_stride, kernel_initializer='he_normal'))
model.add(keras.layers.Activation(tf.nn.relu)) 
model.add(keras.layers.MaxPooling2D(pool_size=(pool_size_h, pool_size_w),strides=pool_stride))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(hidden_size))
model.add(keras.layers.Activation(tf.nn.relu)) 
model.add(keras.layers.Dense(output_size))
model.add(keras.layers.Activation(tf.nn.softmax))

Part 15 ← → Part 17 Click here for the table of contents of the memo Unreadable Glossary