Continued from the previous article .
I would like to classify images using a convolutional neural network (CNN). In Part 1 of this article, we classified handwritten numeric images using a neural network. CNN allows for more accurate classification.
It is one of the most commonly used deep learning models when working with images. In addition to normal neural networks It is called a "convolutional neural network" because it adds a process called "convolution".
In recent years, smartphone cameras have also become high quality, and one camera has several MB. If you use this for learning as it is, it will take a lot of time because it has too much capacity. To increase the efficiency of learning, you need to reduce the size of the image.
However, simply reducing the capacity is not enough. If you make it smaller and the features of the image disappear I don't know what the image is and it doesn't make sense.
__ Convolution processing is to compress while retaining the characteristics of the original image data __.
Specifically, the procedure is as follows.
Finally, convert it to one-dimensional array data and The flow is to learn with a neural network.
In this article, we will run it in a Google Colaboratory environment.
Also, the version of tensorflow is 1.13.1.
If you want to downgrade, you can use the following command.
!pip install tensorflow==1.13.1
I will use tensorflow.keras this time as well.
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.layers import Activation, Dense, Dropout, Conv2D, Flatten, MaxPool2D
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
Image data is downloaded with the cifar10 library.
(train_images, train_labels)
is the training image and correct label
(test_images, test_labels)
is the image and correct label for verification.
(train_images, train_labels), (test_images, test_labels) = cifar10.load_data()
Check the shape of the dataset. You can see that there are 50,000 32-pixel RGB images (32 x 32 x 3) for training and 10,000 for verification.
Let's check the contents of the image as well.
Correct label of the image ↓
The meaning of each number is as follows.
Label "0": airplane Label "1": automobile Label "2": bird Label "3": cat Label "4": deer Label "5": dog Label "6": frog Label "7": horse Label "8": ship Label "9": truck
The contents of train_images are as follows Contains numbers from 0 to 255. (Because of RGB) To normalize this, divide it uniformly by 255.
In a normal neural network, I had to change the training data to one dimension, Since it is necessary to input 3D data in the convolution process, only the normalization process is OK.
train_images = train_images.astype('float32')/255.0
test_images = test_images.astype('float32')/255.0
Also, change the correct label to One-Hot expression with to_categorical.
train_labels = to_categorical(train_labels, 10)
test_labels = to_categorical(test_labels, 10)
The modeling is the following code.
model = Sequential()
#1st convolution process (Conv → Conv → Pool → Dropout)
model.add(Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)))
model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
#Second convolution process (Conv → Conv → Pool → Dropout)
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
#Classification by neural network (Flatten → Dense → Dropout → Dense)
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
I will explain each one.
First, create a sequential model with model = Sequential ()
.
Next is the convolution process. This time, after doing the convolution process twice, I would like to classify by neural network.
I will explain the first convolution process.
First, with model.add (Conv2D (32, (3, 3), activation ='relu', padding ='same', input_shape = (32, 32, 3)))
Create a convolution layer.
We are passing the number of kernels, kernel size, activation function, padding, and input size to Conv2D.
The number of kernels is 32, the size is 3x3, the activation function is relu, and the padding same is the process of enclosing the created feature map in 0.
The feature map created by the above process
With model.add (Conv2D (32, (3, 3), activation ='relu', padding ='same'))
In addition, create a feature map that extracts features in the convolution layer.
Next is the pooling layer.
Compress the image with model.add (MaxPool2D (pool_size = (2, 2)))
.
MaxPool2D is a method called MAX pooling.
The size is the size after compression.
Finally,
model.add (Dropout (0.25))
to invalidate with dropout,
The first convolution process is complete.
Do the same process again and then
Convert to one dimension with model.add (Flatten ())
and
Performs classification prediction of ordinary neural networks.
Before compiling the model, Convert the created model to a TPU model.
You can compile and learn as it is, Convolutional neural networks require a huge amount of computation, so It takes a lot of time if it is not processed by TPU.
Follow the steps below to convert.
#Conversion to TPU model
import tensorflow as tf
import os
tpu_model = tf.contrib.tpu.keras_to_tpu_model(
model,
strategy=tf.contrib.tpu.TPUDistributionStrategy(
tf.contrib.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
)
)
Loss function is suitable for classification categorical_crossentopy, Set the activation function to Adam (learning rate is 0.001) and the evaluation index to acc (correct answer rate).
tpu_model.compile(loss='categorical_crossentropy', optimizer=Adam(lr=0.001), metrics=['acc'])
You will learn with the created model. When learning with the TPU model, the first time it takes a lot of time, but the second and subsequent times are fast. If you train with a normal model instead of TPU, it will take more than twice as long.
history = tpu_model.fit(train_images, train_labels, batch_size=128,
epochs=20, validation_split=0.1)
The correct answer rate seems to exceed 90%. The accuracy is quite high.
plt.plot(history.history['acc'], label='acc')
plt.plot(history.history['val_acc'], label='val_acc')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(loc='best')
plt.show()
When I tried it with the verification data, the correct answer rate dropped to 71.2%. If it is a new image, the accuracy is not so high, so there seems to be room for improvement.
test_loss, test_acc = tpu_model.evaluate(test_images, test_labels)
print('loss: {:.3f}\nacc: {:.3f}'.format(test_loss, test_acc ))
Finally, inference. Pass the image and check what kind of prediction is made. Google Colab's TPU is composed of 8 cores, You have to study by a number divisible by 8. Therefore, I would like to set the training data to 16.
#Display of inferred image
for i in range(16):
plt.subplot(2, 8, i+1)
plt.imshow(test_images[i])
plt.show()
#Display the inferred label
test_predictions = tpu_model.predict(test_images[0:16])
test_predictions = np.argmax(test_predictions, axis=1)[0:16]
labels = ['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck']
print([labels[n] for n in test_predictions])
The image is small and difficult to understand, It seems that you can predict it. Next time, I would like to predict the same image data with a CNN called ResNet.
Recommended Posts