[PYTHON] Challenge image classification with TensorFlow2 + Keras CNN 1 ~ Move for the time being ~

Introduction

This is a study memo for image classification by TensorFlow2 + Keras (the first of ** CNN </ font> **). For the MLP edition (multilayer perceptron model edition), please see here.

The subject matter is the classification of the standard ** handwritten digit image (MNIST) **.

This time, let's train the CNN model for the time being and use it for prediction (classification) while keeping the black box.

MLP version program

Using TensorFlow2 + Keras, the handwritten digit image (MNIST) classification by the ** multi-layer perceptron model ** could be written as follows (Details items / 7d3c7bd3327ff049243a)).

Switch to TensorFlow2 (Google Colab.Environment only)


%tensorflow_version 2.x

Image classification by MLP


import tensorflow as tf

# (1)Download and normalize handwritten digit image dataset
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# (2)Build MLP model
model = tf.keras.models.Sequential()
model.add( tf.keras.layers.Flatten(input_shape=(28, 28)) )
model.add( tf.keras.layers.Dense(128, activation='relu') )
model.add( tf.keras.layers.Dropout(0.2) )
model.add( tf.keras.layers.Dense(10, activation='softmax') )

# (3)Model compilation training
model.compile(optimizer='Adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)

# (4)Model evaluation
model.evaluate(x_test,  y_test, verbose=2)

By doing this, I was able to create a classifier with a correct answer rate of around $ 97.7 % $ </ font>.

CNN version of the program

The Handwritten Numeric Image (MNIST) classification by ** Convolutional Neural Network Model (CNN) ** can be written as: You can turn it into a convolutional neural network model by adding just three lines to the multi-layer perceptron model.

Image classification by CNN


# (1)Download and normalize handwritten digit image dataset
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# (2)Build a CNN model
model = tf.keras.models.Sequential()
model.add( tf.keras.layers.Reshape((28, 28, 1), input_shape=(28, 28)) ) #add to
model.add( tf.keras.layers.Conv2D(32, (5, 5), activation='relu') )      #add to
model.add( tf.keras.layers.MaxPooling2D(pool_size=(2,2)) )              #add to
model.add( tf.keras.layers.Flatten() )                                  #Modification
model.add( tf.keras.layers.Dense(128, activation='relu') )
model.add( tf.keras.layers.Dropout(0.2) )
model.add( tf.keras.layers.Dense(10, activation='softmax') )

# (3)Model compilation training
model.compile(optimizer='Adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)

# (4)Model evaluation
model.evaluate(x_test,  y_test, verbose=2)

By doing this, you can create a classifier with a correct answer rate of around $ 98.7 % $ </ font> (a model with a correct answer rate of about $ 1 % $ higher than the above MLP). Can be made). However, the learning time is longer.

Cases that could not be predicted correctly

Let's look at a specific case where classification (prediction) fails ** (The program for outputting this is "[~ Observe the image that fails to classify ~](https: / /qiita.com/code0327/items/5dfc1b2ed143c1f9bd2b) ").

The red letters displayed in the upper left of each figure are the information ** what number was mistakenly predicted ** (the number in parentheses is the softmax output for the wrong prediction). For example, 5 (0.9) </ font> means "I predicted $ 5 $ with confidence about $ 90 % $". Also, blue number </ font> is the index number of the test data test_x.

4/980 cases where the correct answer value "0" could not be predicted (classified) correctly

0.png

Cases where the correct answer value "1" could not be predicted (classified) correctly 4/1135 cases

1.png

8/1032 cases where the correct answer value "2" could not be predicted (classified) correctly

2.png

Cases where the correct answer value "3" could not be predicted (classified) correctly 12/10 10 cases

3.png

Cases where the correct answer value "4" could not be predicted (classified) correctly 15/982 cases

4.png

6/892 cases where the correct answer value "5" could not be predicted (classified) correctly

5.png

13/958 cases where the correct answer value "6" could not be predicted (classified) correctly

6.png

Cases where the correct answer value "7" could not be predicted (classified) correctly 15/1028 cases

7.png

27/974 cases where the correct answer value "8" could not be predicted (classified) correctly

8.png

Cases where the correct answer value "9" could not be predicted (classified) correctly 26/1009 cases

9.png

next time

Why is the ** Convolutional Neural Network Model (CNN) ** suitable for image classification and image recognition? What is convolution (filter) in the first place? I would like to take up the contents such as.

Recommended Posts