[PYTHON] I implemented the VGG16 model in Keras and tried to identify CIFAR10

Overview

I was studying deep learning and tried to make an output including the establishment of knowledge, so I wrote it as an article. The full code is listed on GitHub. This time, we implemented VGG16, which is famous for CNN models, using Keras, a deep learning framework that makes it easy to build models, and identified images of CIFAR10.

Implementation environment

Execution environment

Google Colaboratory

version

Library import

import


import numpy as np
import sys
%matplotlib inline
import matplotlib.pyplot as plt
import keras
from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten, BatchNormalization
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K

First, import the required libraries. Keras (Official Document) is a high-level deep learning framework with TensorFlow etc. as the back end, which makes it easy to design and extend complex models. ..

Also, CIFAR10 is a color image dataset provided by the University of Toronto for airplanes, cars, birds, cats, deer, etc. 10 types of images of dogs, frogs, horses, boats, and trucks are stored in 32x32 pixels. CIFAR10 is provided by default in the keras.data package, similar to MNIST for handwritten digit data.

Data set preparation

datasets


'''Data set loading'''
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
'''Setting batch size, number of classes, number of epochs'''
batch_size=64
num_classes=10
epochs=20
'''one-hot vectorization'''
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
'''shape display'''
print("x_train : ", x_train.shape)
print("y_train : ", y_train.shape)
print("x_test : ", x_test.shape)
print("y_test : ", y_test.shape)

Next, load the training data and test data with load_data (). The batch size and the number of epochs are defined above. Also, the label data is one-hot vectorized (a vector with only one component being 1 and all others being 0) so that it can be handled by softmax. These shapes look like this:

Output result


x_train :  (50000, 32, 32, 3)
y_train :  (50000, 10)
x_test :  (10000, 32, 32, 3)
y_test :  (10000, 10)

The number of training data is 50,000 and the number of test data is 10000.

Implementation of VGG16 model

Then we will finally make a VGG16 model. The VGG series is explained in detail in this article. Roughly summarized, VGG16 is a CNN model created by the VGG team, a competition for object detection and image classification ILSVRC (IMAGENET Large Scale Visulal Recognition) Is it like a model that ranked high in Challenge)? Due to its relatively simple design and high performance, it is often mentioned in the introduction of deep learning. The origin of 16 seems to be that it consists of 16 layers in total. The structure of VGG16 is as shown in the figure below. (Quoted from Original Paper. VGG16 is Model D.)

スクリーンショット 2020-02-15 14.57.00.png

There are 13 convolutional layers with a filter size of 3x3 and 3 fully connected layers. I implemented VGG16 with reference to the above figure.

VGG16


'''VGG16'''
input_shape=x_train.shape[1:]
model = Sequential()
model.add(Conv2D(filters=64, kernel_size=(3,3), strides=(1,1), padding='same', input_shape=input_shape, name='block1_conv1'))
model.add(BatchNormalization(name='bn1'))
model.add(Activation('relu'))
model.add(Conv2D(filters=64, kernel_size=(3,3), strides=(1,1), padding='same', name='block1_conv2'))
model.add(BatchNormalization(name='bn2'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same', name='block1_pool'))
model.add(Conv2D(filters=128, kernel_size=(3,3), strides=(1,1), padding='same', name='block2_conv1'))
model.add(BatchNormalization(name='bn3'))
model.add(Activation('relu'))
model.add(Conv2D(filters=128, kernel_size=(3,3), strides=(1,1), padding='same', name='block2_conv2'))
model.add(BatchNormalization(name='bn4'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same', name='block2_pool'))
model.add(Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), padding='same', name='block3_conv1'))
model.add(BatchNormalization(name='bn5'))
model.add(Activation('relu'))
model.add(Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), padding='same', name='block3_conv2'))
model.add(BatchNormalization(name='bn6'))
model.add(Activation('relu'))
model.add(Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), padding='same', name='block3_conv3'))
model.add(BatchNormalization(name='bn7'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same', name='block3_pool'))
model.add(Conv2D(filters=512, kernel_size=(3,3), strides=(1,1), padding='same', name='block4_conv1'))
model.add(BatchNormalization(name='bn8'))
model.add(Activation('relu'))
model.add(Conv2D(filters=512, kernel_size=(3,3), strides=(1,1), padding='same', name='block4_conv2'))
model.add(BatchNormalization(name='bn9'))
model.add(Activation('relu'))
model.add(Conv2D(filters=512, kernel_size=(3,3), strides=(1,1), padding='same', name='block4_conv3'))
model.add(BatchNormalization(name='bn10'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same', name='block4_pool'))
model.add(Conv2D(filters=512, kernel_size=(3,3), strides=(1,1), padding='same', name='block5_conv1'))
model.add(BatchNormalization(name='bn11'))
model.add(Activation('relu'))
model.add(Conv2D(filters=512, kernel_size=(3,3), strides=(1,1), padding='same', name='block5_conv2'))
model.add(BatchNormalization(name='bn12'))
model.add(Activation('relu'))
model.add(Conv2D(filters=512, kernel_size=(3,3), strides=(1,1), padding='same', name='block5_conv3'))
model.add(BatchNormalization(name='bn13'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same', name='block5_pool'))
model.add(Flatten(name='flatten'))
model.add(Dense(units=4096, activation='relu', name='fc1'))
model.add(Dense(units=4096, activation='relu', name='fc2'))
model.add(Dense(units=num_classes, activation='softmax', name='predictions'))
model.summary()

There are two types of Keras model construction methods, the Sequential model and the Functional API model, but this time I used the simpler Sequential model. Models are built in series by adding to the model as described above. Please note that the VGG model is originally intended for ILSVRC, so the input size and output size do not match this data. Therefore, the I / O size is changed as follows. This time I'm using the much simpler CIFAR10, so you may not actually need to use such a complex model.

Change before After change
Input size 224×224 32×32
Output size 1000 10

In addition, Batch Normalization is currently used as a method to prevent overfitting of training data, but it is not used because this method was not established when VGG was announced. This time, I also adopted that. The output result of the model is as follows.

Output result


Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
block1_conv1 (Conv2D)        (None, 32, 32, 64)        1792      
_________________________________________________________________
bn1 (BatchNormalization)     (None, 32, 32, 64)        256       
_________________________________________________________________
activation_1 (Activation)    (None, 32, 32, 64)        0         
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 32, 32, 64)        36928     
_________________________________________________________________
bn2 (BatchNormalization)     (None, 32, 32, 64)        256       
_________________________________________________________________
activation_2 (Activation)    (None, 32, 32, 64)        0         
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 16, 16, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 16, 16, 128)       73856     
_________________________________________________________________
bn3 (BatchNormalization)     (None, 16, 16, 128)       512       
_________________________________________________________________
activation_3 (Activation)    (None, 16, 16, 128)       0         
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 16, 16, 128)       147584    
_________________________________________________________________
bn4 (BatchNormalization)     (None, 16, 16, 128)       512       
_________________________________________________________________
activation_4 (Activation)    (None, 16, 16, 128)       0         
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 8, 8, 128)         0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 8, 8, 256)         295168    
_________________________________________________________________
bn5 (BatchNormalization)     (None, 8, 8, 256)         1024      
_________________________________________________________________
activation_5 (Activation)    (None, 8, 8, 256)         0         
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 8, 8, 256)         590080    
_________________________________________________________________
bn6 (BatchNormalization)     (None, 8, 8, 256)         1024      
_________________________________________________________________
activation_6 (Activation)    (None, 8, 8, 256)         0         
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 8, 8, 256)         590080    
_________________________________________________________________
bn7 (BatchNormalization)     (None, 8, 8, 256)         1024      
_________________________________________________________________
activation_7 (Activation)    (None, 8, 8, 256)         0         
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 4, 4, 256)         0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 4, 4, 512)         1180160   
_________________________________________________________________
bn8 (BatchNormalization)     (None, 4, 4, 512)         2048      
_________________________________________________________________
activation_8 (Activation)    (None, 4, 4, 512)         0         
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 4, 4, 512)         2359808   
_________________________________________________________________
bn9 (BatchNormalization)     (None, 4, 4, 512)         2048      
_________________________________________________________________
activation_9 (Activation)    (None, 4, 4, 512)         0         
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 4, 4, 512)         2359808   
_________________________________________________________________
bn10 (BatchNormalization)    (None, 4, 4, 512)         2048      
_________________________________________________________________
activation_10 (Activation)   (None, 4, 4, 512)         0         
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 2, 2, 512)         0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 2, 2, 512)         2359808   
_________________________________________________________________
bn11 (BatchNormalization)    (None, 2, 2, 512)         2048      
_________________________________________________________________
activation_11 (Activation)   (None, 2, 2, 512)         0         
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 2, 2, 512)         2359808   
_________________________________________________________________
bn12 (BatchNormalization)    (None, 2, 2, 512)         2048      
_________________________________________________________________
activation_12 (Activation)   (None, 2, 2, 512)         0         
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 2, 2, 512)         2359808   
_________________________________________________________________
bn13 (BatchNormalization)    (None, 2, 2, 512)         2048      
_________________________________________________________________
activation_13 (Activation)   (None, 2, 2, 512)         0         
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 1, 1, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 512)               0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              2101248   
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
predictions (Dense)          (None, 10)                40970     
=================================================================
Total params: 33,655,114
Trainable params: 33,646,666
Non-trainable params: 8,448
_________________________________________________________________

Model learning

We will learn the created model.

Learning


'''optimizer definition'''
optimizer=keras.optimizers.adam()
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
'''Data normalization'''
x_train=x_train.astype('float32')
x_train/=255
x_test=x_test.astype('float32')
x_test/=255
'''fit'''
history=model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_data=(x_test, y_test))

The optimization method used was the commonly used Adam. Since we will not tune the hyperparameters this time, we set the parameters to the default values. The loss function is the categorical cross entropy used in the multiclass classification problem expressed by Eq. (1).

\begin{equation} L =-\ sum_ {i = 1} ^ {N} y_i \ log {\ hat {y_i}} \ qquad (N: number of classes \ quad y_i: correct label \ quad \ hat {y_i}: predicted label) \ tag {1} \end{equation}

The metric to be optimized is the correct answer rate. (Specified by metrics) Set these by model.compile. Finally, you can normalize the image data and train it with model.fit. Log learning in history. With the above settings, the learning results are as follows.

Execution result


Train on 50000 samples, validate on 10000 samples
Epoch 1/20
50000/50000 [==============================] - 38s 755us/step - loss: 2.0505 - acc: 0.1912 - val_loss: 2.1730 - val_acc: 0.2345
Epoch 2/20
50000/50000 [==============================] - 33s 667us/step - loss: 1.5810 - acc: 0.3763 - val_loss: 1.8167 - val_acc: 0.3522
Epoch 3/20
50000/50000 [==============================] - 33s 663us/step - loss: 1.2352 - acc: 0.5354 - val_loss: 1.4491 - val_acc: 0.5108
Epoch 4/20
50000/50000 [==============================] - 34s 674us/step - loss: 0.9415 - acc: 0.6714 - val_loss: 1.1408 - val_acc: 0.6202
Epoch 5/20
50000/50000 [==============================] - 34s 670us/step - loss: 0.7780 - acc: 0.7347 - val_loss: 0.8930 - val_acc: 0.6974
Epoch 6/20
50000/50000 [==============================] - 34s 675us/step - loss: 0.6525 - acc: 0.7803 - val_loss: 0.9603 - val_acc: 0.6942
Epoch 7/20
50000/50000 [==============================] - 34s 673us/step - loss: 0.5637 - acc: 0.8129 - val_loss: 0.9188 - val_acc: 0.7184
Epoch 8/20
50000/50000 [==============================] - 34s 679us/step - loss: 0.4869 - acc: 0.8405 - val_loss: 1.0963 - val_acc: 0.7069
Epoch 9/20
50000/50000 [==============================] - 34s 677us/step - loss: 0.4268 - acc: 0.8594 - val_loss: 0.6283 - val_acc: 0.8064
Epoch 10/20
50000/50000 [==============================] - 33s 668us/step - loss: 0.3710 - acc: 0.8785 - val_loss: 0.6944 - val_acc: 0.7826
Epoch 11/20
50000/50000 [==============================] - 34s 670us/step - loss: 0.3498 - acc: 0.8871 - val_loss: 0.6534 - val_acc: 0.8024
Epoch 12/20
50000/50000 [==============================] - 33s 663us/step - loss: 0.2751 - acc: 0.9113 - val_loss: 0.6253 - val_acc: 0.8163
Epoch 13/20
50000/50000 [==============================] - 34s 670us/step - loss: 0.2388 - acc: 0.9225 - val_loss: 1.1404 - val_acc: 0.7384
Epoch 14/20
50000/50000 [==============================] - 33s 667us/step - loss: 0.2127 - acc: 0.9323 - val_loss: 0.9577 - val_acc: 0.7503
Epoch 15/20
50000/50000 [==============================] - 33s 667us/step - loss: 0.1790 - acc: 0.9421 - val_loss: 0.7820 - val_acc: 0.7915
Epoch 16/20
50000/50000 [==============================] - 33s 666us/step - loss: 0.1559 - acc: 0.9509 - val_loss: 0.7138 - val_acc: 0.8223
Epoch 17/20
50000/50000 [==============================] - 34s 671us/step - loss: 0.1361 - acc: 0.9570 - val_loss: 0.8909 - val_acc: 0.7814
Epoch 18/20
50000/50000 [==============================] - 33s 669us/step - loss: 0.1272 - acc: 0.9606 - val_loss: 0.7006 - val_acc: 0.8246
Epoch 19/20
50000/50000 [==============================] - 33s 666us/step - loss: 0.1130 - acc: 0.9647 - val_loss: 0.7523 - val_acc: 0.8177
Epoch 20/20
50000/50000 [==============================] - 34s 671us/step - loss: 0.0986 - acc: 0.9689 - val_loss: 0.7233 - val_acc: 0.8350

After completing 20 epochs, the correct answer rate was about 97% for training data and about 84% for test data. Let's plot the loss and correct answer rate for each epoch.

Graph plot


'''Visualization of results'''
plt.figure(figsize=(10,7))
plt.plot(history.history['acc'], color='b', linewidth=3)
plt.plot(history.history['val_acc'], color='r', linewidth=3)
plt.tick_params(labelsize=18)
plt.ylabel('acuuracy', fontsize=20)
plt.xlabel('epoch', fontsize=20)
plt.legend(['training', 'test'], loc='best', fontsize=20)
plt.figure(figsize=(10,7))
plt.plot(history.history['loss'], color='b', linewidth=3)
plt.plot(history.history['val_loss'], color='r', linewidth=3)
plt.tick_params(labelsize=18)
plt.ylabel('loss', fontsize=20)
plt.xlabel('epoch', fontsize=20)
plt.legend(['training', 'test'], loc='best', fontsize=20)
plt.show()

The transition of the correct answer rate is as shown in the figure below.

accuracy_VGG16.png

The transition of the loss function is as shown in the figure below.

loss_VGG16.png

Hmm ... The loss of test data has become unstable from around the 4th epoch. I did Batch Normalization, but it looks like Over tarining.

Data storage

This learning did not take much time, but the model trained for a long time can be saved and reused. Save the model and weights as shown below.

Save model


'''Data storage'''
model.save('cifar10-CNN.h5')
model.save_weights('cifar10-CNN-weights.h5')

Summary

This time it is a tutorial, so I used Keras to identify the image of CIFAR10 with the famous VGG16 model. Since VGG16 was originally a model used for 1000 class classification, I used Batch Normalization with different input / output sizes, but I overtrained it. The I / O size may be too small. In addition, as improvement methods, implementation of Dropout and L2 regularization and tuning of optimization methods can be considered.

References

For the implementation of this code, I referred to the following books.

--"Deep Learning Practical Techniques & Tuning Techniques by Keras" Masaki Aono, published by Morikita Publishing Co., Ltd., 2019

Recommended Posts

I implemented the VGG16 model in Keras and tried to identify CIFAR10
I tried to train the RWA (Recurrent Weighted Average) model in Keras
I tried to illustrate the time and time in C language
I implemented DCGAN and tried to generate apples
I tried to integrate with Keras in TFv1.1
I tried to implement TOPIC MODEL in Python
I tried using the trained model VGG16 of the deep learning library Keras
I tried to organize the evaluation indexes used in machine learning (regression model)
I tried to graph the packages installed in Python
I tried to implement Grad-CAM with keras and tensorflow
I tried to identify the language using CNN + Melspectogram
[Linux] I learned LPIC lv1 in 10 days and tried to understand the mechanism of Linux.
I tried to summarize the code often used in Pandas
I tried programming the chi-square test in Python and Java.
I tried to display the time and today's weather w
[Introduction to infectious disease model] I tried fitting and playing ♬
I tried to implement the mail sending function in Python
[TF] How to load / save Model and Parameter in Keras
I tried to enumerate the differences between java and python
I implemented N-Queen in various languages and measured the speed
I also tried to imitate the function monad and State monad with a generator in Python
I tried to move the ball
I tried to estimate the interval.
I tried to describe the traffic in real time with WebSocket
I tried to process the image in "sketch style" with OpenCV
I tried to process the image in "pencil style" with OpenCV
I tried to find out the difference between A + = B and A = A + B in Python, so make a note
[RHEL7 / CentOS7] I put in the log monitoring tool swatch and tried to notify by email
I tried fitting the exponential function and logistics function to the number of COVID-19 positive patients in Tokyo
I tried to implement PLSA in Python
I tried to summarize the umask command
I tried to implement permutation in Python
I tried to recognize the wake word
I tried to implement PLSA in Python 2
I tried to summarize the graphical modeling.
I tried to implement ADALINE in Python
I tried to estimate the pi stochastically
I tried to touch the COTOHA API
I tried to implement PPO in Python
Implemented DQN in TensorFlow (I wanted to ...)
[Python] I tried to summarize the set type (set) in an easy-to-understand manner.
I tried to summarize until I quit the bank and became an engineer
I tried moving the image to the specified folder by right-clicking and left-clicking
I tried to express sadness and joy with the stable marriage problem.
765 I tried to identify the three professional families by CNN (with Chainer 2.0.0)
I tried to learn the angle from sin and cos with chainer
The file name was bad in Python and I was addicted to import
I tried to extract and illustrate the stage of the story using COTOHA
I tried to explain the latest attitude estimation model "Dark Pose" [CVPR2020]
I tried to display the altitude value of DTM in a graph
I tried to predict the behavior of the new coronavirus with the SEIR model.
I tried to make PyTorch model API in Azure environment using TorchServe
I tried to control the network bandwidth and delay with the tc command
I tried to notify the update of "Hamelin" using "Beautiful Soup" and "IFTTT"
I want to visualize where and how many people are in the factory
I tried web scraping to analyze the lyrics.
[Python] I tried to judge the member image of the idol group using Keras
I tried to move GAN (mnist) with keras
I tried to optimize while drying the laundry
I tried to save the data with discord
Exposing the DCGAN model for Cifar 10 with keras