[PYTHON] Add convolution to MNIST

Previously, I tried to read the MNIST code to understand the processing of TensorFlow, but after all it is an image, so I want to CNN. So, I added a convolution process. Of course, the reference is "Deep MNIST for Experts", but here / 02/25/15 3000) was also used as a reference.

Convolution process

Apply an n × n filter. People who are doing image processing are very familiar with it because it has extracted and blurred edges. 20161226145014.png

The function to be used is "tensorflow.nn.conv2d ()", and there are mainly four arguments to set. The first is input data. The second is the filter (vertical size, horizontal size, number of input channels, number of output channels). The third is a moving stride. The fourth is the padding setting.

Normally, it is set like this.

x = tensorflow.placeholder(tf.float32, [None, 784])
x_image = tensorflow.reshape(x, [-1, 28, 28, 1])
initial = tensorflow.truncated_normal([5, 5, 1, 32], stddev=0.1)
W_conv1 = tensorflow.Variable(initial)
h_conv1 = tensorflow.nn.conv2d(x_image,
                               W_conv1,
                               strides=[1, 1, 1, 1],
                               padding='SAME')

Since the MNIST data is a 28x28 image stretched to one dimension, it is first restored to 28x28. Next, I set a 5x5 filter. Since the input is a black and white image, it has 1 channel and the output is 32 channels.

In addition, during the convolution process, ReLU is used as the activation function, so the last line is actually

initial = tensorflow.constant(0.1, [32])
b_conv1 = tensorflow.Variable(initial)
h_conv1 = tensorflow.nn.relu(tensorflow.nn.conv2d(x_image,
                                                  W_conv1,
                                                  strides=[1, 1, 1, 1],
                                                  padding='SAME')
                             + b_conv1)

It will be like that. relu.png ("W_conv1" and "b_conv1" are the weight parameters adjusted by error back propagation)

Pooling process

The pooling process is usually paired with the convolution process. This is an image that reduces the image size. (Determine the representative value of n × n) filter.png There are various methods such as average value and median value, but in Deep Learning, the maximum value called "Max Poolong" is often used. 20150126055504.png

TensorFlow uses "tensorflow.nn.max_pool ()". There are four main arguments to set. The first is input data. The second is a filter. The third is a moving stride. The fourth is the padding setting. It's similar to the convolution process, isn't it?

The actual code looks like this.

h_pool1 = tensorflow.nn.max_pool(h_conv1,
                                 ksize=[1, 2, 2, 1],
                                 strides=[1, 2, 2, 1],
                                 padding='SAME')

The input is the result of the convolution process. The size of the filter is 2x2, and the stride is also moved by 2 pixels vertically and horizontally. This will reduce the size of the finished image to half the original size. (The number of channels does not change)

For full join processing

In addition, since the convolution processing and pooling processing are processed as a two-dimensional image, it is converted to one-dimensional when fully combined. At this time, please note that the size of the array cannot be set unless the image size and the number of channels are properly known. The function uses "tensorflow.reshape ()" in the same way as when changing from 1D to 2D.

bonus

There are some parts that I haven't explained yet, but I'll post the source code that works (should).

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

#Original functions
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                          strides=[1, 2, 2, 1], padding='SAME')

#Main function
def main():
    #Get dataset
    #(Now, specify the zip folder that has been downloaded in advance)
    mnist = input_data.read_data_sets("MNIST_data", one_hot=True)

    #I / O data preparation
    x = tf.placeholder(tf.float32, [None, 784])
    y_ = tf.placeholder(tf.float32, [None, 10])
    x_image = tf.reshape(x, [-1, 28, 28, 1])

    #Convolution process(1)
    W_conv1 = weight_variable([5, 5, 1, 32])
    b_conv1 = bias_variable([32])
    h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)

    #Pooling process(1)
    h_pool1 = max_pool_2x2(h_conv1)

    #Convolution process(2)
    W_conv2 = weight_variable([5, 5, 32, 64])
    b_conv2 = bias_variable([64])
    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)

    #Pooling process(2)
    h_pool2 = max_pool_2x2(h_conv2)

    #Full join processing
    W_fc1 = weight_variable([7 * 7 * 64, 1024])
    b_fc1 = bias_variable([1024])
    h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
    h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

    #Drop out
    keep_prob = tf.placeholder(tf.float32)
    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

    #Identification (why there is no softmax is a mystery)
    W_fc2 = weight_variable([1024, 10])
    b_fc2 = bias_variable([10])
    y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

    #Evaluation processing (for some reason there is softmax here)
    cross_entropy = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
    correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

    #Create session
    sess = tf.InteractiveSession()
    tf.global_variables_initializer().run()

    #training
    for i in range(20000):
        #Batch size is 50
        batch = mnist.train.next_batch(50)
        #Progress display every 100 times
        if i%100 == 0:
            train_accuracy = accuracy.eval(feed_dict={
                x:batch[0], y_: batch[1], keep_prob: 1.0})
            print("step %d, training accuracy %g"%(i, train_accuracy))
        train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

    #Result display
    print("test accuracy %g"%accuracy.eval(feed_dict={
        x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

#Start processing
if __name__ == "__main__":
    main()

After that, I have to write the dropout and evaluation process. .. .. (Evaluation process, I'm not sure ^^;)

Recommended Posts

Add convolution to MNIST
Add / remove kernel to JupyterLab
Add a dictionary to MeCab
Add page number to PDF
Add System to pyenv-win versions
Add user dictionary to MeCab
To add a C module to MicroPython ...
Add fields to features with ArcPy
Add cumulative ratio to matplotlib histogram
Add Python 2.7 Japanese documentation to Dash.app
How to add sudo when debugging
Add parameters to Django's custom commands
How to add AWS EBS volume
Add TRACE log level to Python ...?
Add SSH connectable users to EC2
Workaround if you cannot add to LD_LIBRARY_PATH
Add a GPIO board to your computer. (1)
Add your own content view to mitmproxy
(For myself) Flask_5 (Add to txt file)
How to add a package with PyCharm
Introduction to Deep Learning ~ Convolution and Pooling ~
[MNIST] Convert data to PNG for keras
[Python] Introduction to CNN with Pytorch MNIST
In Jupyter, add IPerl to the kernel.
Add images to iOS photos with Pythonista
Add cumulative ratio to matplotlib bar chart
[Python] Add total rows to Pandas DataFrame
Add Gaussian noise to images with python2.7
How to add python module to anaconda environment
How to add options to Django's manage.py runserver
What I always add to my ~ / .bashrc
Add a Python virtual environment to VSCode
[Python] Add comments to standard input files