I'm fumio, a beginner in machine learning. I am devoted to the fun of machine learning programming and learning every day.
I am learning "Learn from mosaic removal: cutting-edge deep learning" written by koshian2. In order to deepen my understanding of what I learned, I would like to summarize an example of applying a convolutional neural network (CNN) to image discrimination. https://qiita.com/koshian2/items/aefbe4b26a7a235b5a5e
The main points are as follows.
A convolutional neural network (CNN) is a forward propagation network that includes two types of layers, a convolutional layer and a pooling layer, and is applied to image recognition.
cifar10.ipynb
import matplotlib.pyplot as plt
cifar_classes = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]
(X_train, y_train),(X_test,y_test) = tf.keras.datasets.cifar10.load_data()
print(X_train.shape,y_train.shape)
print(X_test.shape,y_test.shape)
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz 170500096/170498071 [==============================] - 13s 0us/step (50000, 32, 32, 3) (50000, 1) (10000, 32, 32, 3) (10000, 1)
Read directly from the Keras dataset. If you check the dimensions of the training data, you can see that it is 50,000 32 x 32 x 3 data. Since it is a color image, it is three-dimensional.
cifar10.ipynb
fig = plt.figure(figsize=(14,14))
for i in range(100):
ax = plt.subplot(10,10,i+1)
ax.imshow(X_train[i])
ax.axis('off')
ax.set_title(cifar_classes[y_train[i,0]])
The image looks like this. It's already blurry from the beginning, but somehow I can understand the meaning of each name and photo. However, you can see that some types are difficult to distinguish (deer and horse, automobile and truck, etc.).
Pooling on CNN refers to compressing and downsampling information. It is usually applied as a pooling layer after the convolutional layer. The main effects are as follows.
The output in the pooling layer can be made constant even if the position of the feature corresponding to the position change in 1. is slightly deviated. In other words, taking handwritten numbers as an example, even if the numbers are slightly misaligned, they can be recognized as the same numbers.
In order to make this CIFAR-10 discrimination, we will make a model of 9 layers of convolution + 1 layer of fully connected layers, for a total of 10 layers.
Make models in this order. ReLU is used as the activation function.
cifar10.ipynb
inputs = layers.Input((32,32,3))
x = inputs
for ch in [64, 128, 256]:
for i in range(3):
x = layers.Conv2D(ch, 3, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)
if ch != 256:
x = layers.AveragePooling2D()(x)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(10, activation="softmax")(x)
model = tf.keras.models.Model(inputs, x)
model.summary()
conv2d_12 (Conv2D) (None, 8, 8, 256) 590080
batch_normalization_12 (Batc (None, 8, 8, 256) 1024
re_lu_12 (ReLU) (None, 32, 32, 256) 0
average_pooling2d_3 (Average (None, 8, 8, 256) 0
global_average_pooling2d_1 ( (None, 256) 0
dense_1 (Dense) (None, 10) 2570
Only the last part of the output was extracted. The dimensions change as follows. (None,32,32,3)→(None,32,32,64)→(None, 16, 16, 128) →(None, 8, 8, 256)→(None, 256)→(None, 10) You can see that the dimension is halved when passing through the pooling layer.
cifar10.ipynb
X_train = X_train.astype(np.float32) / 255.0
X_test = X_test.astype(np.float32) / 255.0
y_train = y_train.astype(np.float32)
y_test = y_test.astype(np.float32)
Next, since the original data is unit8 type and scale [0,255], convert the data type to float32 and the scale to [0,1].
cifar10.ipynb
model.fit(X_train,y_train, validation_data=(X_test, y_test),epochs=10)
It may take a long time depending on the PC specs (my PC specs took about 10 minutes for each epoch). Therefore, we recommend that you proceed with the help of Google Colab. The image below is the predicted result and the correct answer. What is written in red is incorrect. I took epoch only 10 times, so the wrong answer rate was about 38%.
The full program is available here. https://github.com/Fumio-eisan/cifar10_20200308
Recommended Posts