Introduction

This article is the second article scheduled for all three times. This article is just a line-by-line explanation of mnist_cnn.py. There are some overlaps with the previous one, but please note that some of the content may be duplicated to make it easier to read the article alone.

It is intended for people who are interested in AI but have not touched it yet. I think that if you read this, you should be able to understand the basic learning flow of deep learning. (Originally, it was created in-house with the intention of using it for training)

[For AI beginners] Explain mnist_mlp.py line by line (learn MNIST with Keras)
[For AI beginners] Explain mnist_cnn.py line by line (learn MNIST with Keras)
[For AI beginners] Explain mnist_transfer_cnn.py line by line (learn MNIST with Keras)

About operation check method

Since MNIST is an image, it's better to have a GPU to run this code (it's a bit painful on a CPU). The recommended method is to use Google Colaboratory. There are only two things to do. · Open a new notebook in Python 3 · Enable GPU from runtime You can now use the GPU. Just paste the code into the cell and run it (shortcut CTRL + ENTER) and it will work.

About mnist

A dataset of handwritten text images, often used in machine learning tutorials. Content: Handwritten characters from 0 to 9 Image size: 28px * 28px Color: black and white Data size: 70,000 (training data 60,000, test data 10,000 images and labels are available)

What is cnn

Convolutional Neural Network, a convolutional neural network. This article is for code purposes only, so I won't go into detail about the convolution process. To put it very simply, you can ** extract the features of an image ** by performing a convolution process.

About mnist_cnn.py

It is a code to create a model that judges handwritten characters of mnist using Keras and TensorFlow. Create a model that receives 10 types of handwritten characters from 0 to 9 as input and classifies them into 10 types from 0 to 9.

The function is the same as the first mnist_mlp.py, but the processing in the model is different.

Also, in mlp, 28 * 28 2D data (such data → [[0, 0, ..., 0, 0], ... [0, 0, ..., 0, 0]]) 1D of 784 It was converted to data (such data → [0, 0, ... 0, 0]) and used as the input value. In this cnn, 28 * 28 2D data is used as the input value as it is.

Code description

Preparation

'''Trains a simple convnet on the MNIST dataset.
Gets to 99.25% test accuracy after 12 epochs
(there is still a lot of margin for parameter tuning).
16 seconds per epoch on a GRID K520 GPU.
'''

#No special code needed (Python version 3 but needed if the code is written in Python 2)
from __future__ import print_function

#Import the required libraries
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K

#constant
batch_size = 128  #Batch size. Data size to be learned at one time
num_classes = 10  #Number of labels to classify. This time, we will classify handwritten images into 10 types from 0 to 9.
epochs = 12       #Number of epochs. How many times to learn all data(In the previous article mlp, this was 20)
img_rows, img_cols = 28, 28  #Number of dimensions of input image

Dense: Fully connected layer There is a process called Dropout: Dropout, which disables the output (0) at a certain rate (probability). Flatten: Convert data to one dimension Conv2D: Convolution layer MaxPooling2D: Pooling layer

Of the constants defined at the beginning, batch_size and epochs are ** hyperparameters ** that humans need to adjust. By changing this, the performance of the model will change.

To briefly explain, the larger the batch_size, the more stable the learning, but the more memory is required. epochs is the number of times to train. I feel that it will be smarter to study a lot, but if it is too large, it will fall into a state of ** overfitting ** and generalization performance will decrease. (= You can judge the data used for learning, but you will not be able to deal with unknown data.) Of course, at least it remains crazy due to lack of learning.

Data preprocessing

#Read mnist data and train data(60,000 cases)And test data(10,000 cases)Divide into
(x_train, y_train), (x_test, y_test) = mnist.load_data()

#Convert the shape of the data
'''
Keras backend is Theano(channels_first)Or tensorflow(channels_last)Image format is different depending on
K.image_data_format()Is"channels_last"Or"channels_first"Returns one of the above, so classify according to this
In the case of black and white, the number of channels is 1. If it is an RGB color image, the number of channels will be 3.
This time x_train：(60000, 28, 28)->(60000, 28, 28, 1)become
'''
if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

#Image data takes a value from 0 to 255, so standardize the data by dividing by 255.
# .astype('float32')Convert the data type with.(Otherwise you should get an error when you break)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

#Check by outputting the dimension and number of data
print('x_train shape:', x_train.shape)    # x_train shape: (60000, 28, 28, 1)
print(x_train.shape[0], 'train samples')  # 60000 train samples
print(x_test.shape[0], 'test samples')    # 10000 test samples

#Label data one-hot-Vectorize
'''one-hot-The image of vector looks like this
label  0 1 2 3 4 5 6 7 8 9
0:    [1,0,0,0,0,0,0,0,0,0]
8:    [0,0,0,0,0,0,0,0,1,0]'''
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

About standardization: The value of each pixel in the image is from 0 to 255. This is an image that converts this to 0 ~ 1. When machine learning with images, the value is standardized by dividing by 255.

About one-hot-vector: This time, there are 10 types of labels from 0 to 9, and each is represented by a number from 0 to 9. However, the numbers on the label itself are meaningless because I just want to classify them into 10 types. Therefore, by one-hot-vector, it is converted so that only 0 and 1 can represent which label.

Model definition

#Instantiate Sequential class
model = Sequential()

#Middle layer
#Convolution layer (filter: 32 sheets, filter size:(3, 3), Activation function: Relu, Received input size:( 28, 28, 1))Add
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
#Convolution layer (filter: 64 sheets, filter size:(3, 3), Activation function: Relu, Received input size is automatically determined)Add
model.add(Conv2D(64, (3, 3), activation='relu'))
#Pooling layer
model.add(MaxPooling2D(pool_size=(2, 2)))
# 0.25 chances of dropout
model.add(Dropout(0.25))
#Convert data to one dimension
model.add(Flatten())
#Added fully connected layer (128 units, activation function: relu, input size to be received is automatically determined)
model.add(Dense(128, activation='relu'))
# 0.5 chance to drop out
model.add(Dropout(0.5))

#Output layer
#Added fully connected layer (10 units, activation function: SoftMax, input size to be received is automatically determined)
model.add(Dense(num_classes, activation='softmax'))

The Sequential model is a model made by stacking layers of DNN. You need to specify input_shape only for the very first layer. Since the activation function of the output layer is a multi-value classification model this time, softmax is used.

Dense (fully connected layer) can only be received as one-dimensional data. Therefore, it is necessary to convert the data to one dimension with ** Flatten () ** before Dense.

Learning

#Set up the learning process

model.compile(
              #Set the loss function. This time it's a classification, so categorical_crossentropy
              loss=keras.losses.categorical_crossentropy,
              #The optimization algorithm is Adadelta(Difference from mlp)
              optimizer=keras.optimizers.Adadelta(),
              #Specify the evaluation function
              metrics=['accuracy'])

#To learn
model.fit(x_train, y_train,      #Training data, labels
          batch_size=batch_size, #Batch size (128)
          epochs=epochs,         #Number of epochs (12)
          verbose=1,             # #Display the progress of learning as a bar graph in real time(Hide at 0)
          validation_data=(x_test, y_test))  #test data(To test each epoch and calculate the error)

After defining the model, specify the loss function and optimization algorithm and compile. Then pass the data to the model for training.

The optimization algorithm Adadelta specified here is also a kind of hyperparameter. I think that the algorithm called Adam is often used, but there is absolutely no such thing as a good algorithm, so trial and error is required.

Evaluation

#Pass test data(verbose=0 does not give a progress message)
score = model.evaluate(x_test, y_test, verbose=0)
#Output generalization error
print('Test loss:', score[0])
#Output generalization performance
print('Test accuracy:', score[1])

After learning, use the test data to evaluate how much performance you have achieved. The lower the loss and the higher the accuracy, the better the model.

[PYTHON] [For AI beginners] I will explain mnist_cnn.py line by line (learn MNIST with Keras)