[PYTHON] Create a classification model for CIFAR-10 image datasets in Keras: Not accurate when done literally in ResNet

Introduction

Using deep learning such as CNN, I created some models to classify CIFAR-10 datasets. In this article, we will not perform ** Data Augmentation because we want to evaluate the performance purely due to the difference in model architecture. ** ** I referred to the book Advanced Deep Learning with Keras. First try the convolutional neural network (CNN) to see how accurate it is, then try an architecture such as ResNet. I tried to make it according to the book referring to ResNet, but this code is based on Data Augmentation, and at first it was not accurate at all, so I tried a little ingenuity.

About CIFAR-10

CIFAR-10 is a set of 60,000 image data, and one image is 32 x 32 pixels and has 3 channels of RGB. It consists of 50,000 training datasets and 10,000 test data. There are 10 classes: planes, cars, birds, cats, deer, dogs, frogs, horses, boats and trucks. 001_CIFAR10.png When using Keras, you can load the CIFAR-10 dataset with the following code. As usual when doing deep learning, divide by 255 and standardize the value between 0 and 1.

from keras.datasets import cifar10
(x_train,y_train),(x_test,y_test)=cifar10.load_data()

x_train = x_train.astype('float32') / 255
x_test  = x_test.astype('float32') / 255

A simple CNN model to serve as a reference

I tried to see how accurate it would be with a simple model consisting of two convolution layers, MaxPooling, and Dropout.

input_shape = (32, 32, 3)

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                activation='relu',
                input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

#Loss function,Optimization function,Compile the model with metrics
model.compile(loss=keras.losses.categorical_crossentropy,
             optimizer=keras.optimizers.Adam(),
             metrics=['accuracy'])

#Conduct learning
history = model.fit(x_train, y_train,
         batch_size=batch_size,
         epochs=epochs,
         verbose=1,
         validation_data=(x_test, y_test))

This is the result of learning. The accuracy for the test data was ** 71.6% **. 011_simpleCNN.png

Complex CNN model

Increase the number of convolution layers, increase the number of filters, and include Batch Normalization.

input_shape = (32, 32, 3)

model = Sequential()
model.add(Conv2D(64, kernel_size=(3, 3),
                input_shape=input_shape, padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))

model.add(Conv2D(128, kernel_size=(3, 3),  padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))

model.add(Conv2D(128, kernel_size=(3, 3),  padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))

model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(256, (2, 2), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))

model.add(Conv2D(256, (2, 2), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))

model.add(Conv2D(256, (2, 2), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))

model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(512, (2, 2), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))

model.add(Conv2D(512, (2, 2), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))

model.add(Conv2D(512, (2, 2), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))

model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(1024))
model.add(BatchNormalization())
model.add(Activation('relu'))

model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

#Loss function,Optimization function,Compile the model with metrics
model.compile(loss=keras.losses.categorical_crossentropy,
             optimizer=keras.optimizers.Adam(),
             metrics=['accuracy'])

In addition, the following function is defined to attenuate the learning rate according to the number of learnings.

def step_decay(epoch):
    lr = 0.001
    if(epoch >= 20):
        lr/=2
    if(epoch>=30):
        lr/=4
    if(epoch>=50):
        lr/=5
    return lr

#Learning rate attenuation
lr_decay = LearningRateScheduler(step_decay)
#Implementation of learning
history = model.fit(x_train, y_train,
         batch_size=batch_size,
         epochs=epochs,
         verbose=1,
         validation_data=(x_test, y_test),
         callbacks = [lr_decay])

This is the result of learning. The accuracy for the training data is 98.9%, and the features are well learned. Also, the accuracy for the test data is now ** 90.5% **. 012_complicated_CNN.png

ResNet

About ResNet

ResNet is a convolutional neural network that uses a technique called Deep Residual Learning to solve the vanishing gradient problem. It was proposed in the paper "Deep Residual Learning for Image Recognition". At the same time as stacking the convolution layers, the input is added to the output from the convolution as it is, as shown by the line on the right side of the figure below. This allows the residuals to be transmitted to the lower layers. 101_ResNet.png For more information on ResNet and why it works, see, for example, this article (https://qiita.com/koharite/items/3be5bea73925b609f6b0).

Implement ResNet in Keras

Implement as the text referenced at the beginning. However, I decided not to carry out only Data Augumentation.

First, define the smallest elements that are commonly used. The smallest element is "2D convolution"-> "Batch Normalization"-> "Activation (ReLU)". However, as shown in the Deep Residual Learning diagram above, Activation (ReLU) may be applied after adding the output and input of the convolutional layer (the part ⊕ in the figure), so decide whether to execute Activation. I am trying to control it.

def resnet_layer(inputs, num_filters=32, kernel_size=3,
                 strides=1, activation='relu', batch_normalization=True, conv_first=True):

    conv = Conv2D(num_filters,
                  kernel_size=kernel_size,
                  strides=strides,
                  padding='same',
                  kernel_initializer='he_normal',
                  kernel_regularizer=l2(1e-4))

    x = inputs
    if conv_first:
        x = conv(x)
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
    else:
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
        x = conv(x)
    return x

Repeating this element forms the entire ResNet architecture. The parameter n specifies the number of layers to overlap. The number of filters starts from 16 and increases in 3 steps from 32 to 64. This corresponds to the stack variable in the for statement. During this stage, Deep Residual Learning is repeated for the number of parameters n. In other words, the two convolutional neural networks are overlapped, then the convolutional output and the original output are added, and finally the structure activated by ReLU is repeated n times.

In Keras, we implement it as add ([x, y]) to express the sum of the convolutional output and the original input. It's really easy to do. However, immediately after increasing the number of filters, the dimensions of the convolution output and the original input are different, so there is an operation to match the dimensions.

    n=3
    
    num_filters = 16
    num_res_blocks = n

    inputs = Input(shape=input_shape)
    x = resnet_layer(inputs=inputs)
    # instantiate the stack of residual units
    for stack in range(3):
        for res_block in range(num_res_blocks):
            strides = 1
            # first layer but not first stack
            if stack > 0 and res_block == 0:  
                strides = 2  # downsample
            y = resnet_layer(inputs=x,
                             num_filters=num_filters,
                             strides=strides)
            y = resnet_layer(inputs=y,
                             num_filters=num_filters,
                             activation=None)
            # first layer but not first stack
            if stack > 0 and res_block == 0:
                #Match the dimensions of the convolutional output with the original input
                x = resnet_layer(inputs=x,
                                 num_filters=num_filters,
                                 kernel_size=1,
                                 strides=strides,
                                 activation=None,
                                 batch_normalization=False)
            x = add([x, y])
            x = Activation('relu')(x)
        num_filters *= 2
   
    # add classifier on top.
    x = AveragePooling2D(pool_size=8)(x)
    y = Flatten()(x)
    outputs = Dense(num_classes,
                    activation='softmax',
                    kernel_initializer='he_normal')(y)

If you try to implement it according to the text,

First, I implemented it according to the text and tried it. This is the result of ResNet20 with n = 3. The learning results are shown in the figure below. 102_ResNet_DropOutなしn3.png The accuracy is 96.2% for the training data, but only ** 66.2% ** for the test data. It is less accurate than the standard simple CNN. .. .. Overfitting on training data. If you don't pad the data with Data Augmentation, the model will be too complex and overfitting will occur, and it seems that the test data will not be accurate.

I tried adding Dropout and the accuracy improved

I tried Dropout as a common practice when overfitting. For how to insert Dropout, I referred to the article Understanding Residual Network (ResNet) and Best Practice for Tuning, and tried inserting Dropout between the convolution layers.

            y = resnet_layer(inputs=x,
                             num_filters=num_filters,
                             strides=strides)
           #Added Dropout between convolution layers
            y = Dropout(0.3)(y)
            y = resnet_layer(inputs=y,
                             num_filters=num_filters,
                             activation=None)

In addition, Dropout was added to the last part of the classification by fully connected layer after performing Flatten.


    #DropOut is also included in the classification by the last fully connected layer.
    # add classifier on top.
    # v1 does not use BN after last shortcut connection-ReLU
    x = AveragePooling2D(pool_size=8)(x)
    y = Flatten()(x)
    y = Dropout(0.4)(y) #Add Dropout
    outputs = Dense(num_classes,
                    activation='softmax',
                    kernel_initializer='he_normal')(y)

This is the result of ResNet20 with n = 3, which is the same as before. 111_ResNet_Dropoutありn3.png The accuracy was 95.0% for the training data and ** 86.1% ** for the test data, which is a lot better than before.

The result of ResNet50 with n = 8 is as follows. 121_ResNet_Dropoutありn8.png It was further improved to 99.2% for training data and ** 87.4% ** for test data.

However, it is inferior to the complicated CNN. If you tune the hyperparameters over time, the accuracy will increase further, but even with a little trial, I could not realize the performance better than the conventional CNN.

Summary

For CIFAR-10, I tried simple CNN, complicated CNN, ResNet without Dropout as in the original paper, and ResNet with Dropout to avoid overfitting. To the extent I tried this time, I couldn't show the superiority of ResNet. If you have a relatively simple dataset like CIFAR-10, you may not need an architecture like ResNet. It seems that doing Data Augmentation well is more effective if you simply aim for accuracy. Maybe I couldn't show it in a short time due to my lack of hyperparameter tuning ability. If there is a way to show the superiority of ResNet even with CIFAR-10 without using Data Augmentation, please let me know.

Recommended Posts

Create a classification model for CIFAR-10 image datasets in Keras: Not accurate when done literally in ResNet
Tips for implementing a slightly difficult Model or Training in Keras
Create a model for your Django schedule