[PYTHON] I tried TensorFlow tutorial CNN 4th

I tried TensorFlow Official Tutorial The result is still bad in the previous NN. Using CNN will further improve accuracy.

CNN CNN is often used in image recognition and voice recognition. It can be made by combining a "convolution layer" and a "Pooling layer".

Convolution process

① Apply a 3 * 3 square convolution filter to the input data. The convolution filter determines the distance traveled by stride. If 1 pixel is specified, it will be shifted by 1 pixel. If you apply a 5x5 convolution filter to 28x28 data, it becomes 24x24, which is smaller. ← There is zero padding to deal with this. Zero padding is the process of surrounding the input data with zeros. (2) Calculate the sum of the calculations obtained with one filter.

Pooling process

A process that reduces the dimension of the result of the convolution process. Example: Suppose you have a 2x2 pooling filter max pooling applies a 2x2 pooling filter to the convolution result and gets the maximum value in that 2x2.

Activation function

This time we will use the ReLu function The ReLu function is as follows y = max(x, 0) 0 when 0 or less When it is 0 or more, x Function that returns

Program flow

Data reading

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot = True)

Import tensorflow

import tensorflow as tf

Set up a Session

This time we use ʻInteractiveSession ()`

sess = tf.InteractiveSession()

Create a container for input x and correct label y_

Created with placeholder. By the way, I also created weights and biases.

x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None,10])

W = tf.Variable(tf.zeros([783, 10]))
b = tf.Variable(tf.zeros([10]))

Create a function to calculate weights and biases

The weight ʻinitial = tf.truncated_normal (shape, stddev = 0.1)` gives the initial value. It has a shape in which the left and right sides of the normal distribution are cut off, and stddev is used to specify the data distribution with the standard deviation.

Bias ʻinitial = tf.constant (0.1, shape = shape)gives '0.1' as a bias because the calculation does not proceed if the value is0`.

def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)
    
def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

Convolution processing / convolution layer

Argument the input data and ears. strides = [1,1,1,1] means to adapt by shifting by 1 pixel. padding ='SAME' is converted to data with 0s on the left and right (filled with 0s so that the output is the same size as the input).

def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1,1,1,1], padding='SAME')

Pooling layer

Layer to extract features to reduce size ksize = [1,2,2,1] Apply 2 * 2 blocks. strides = [1,2,2,1] means to adapt by shifting by 2 pixels. padding ='SAME' is converted to data with 0s on the left and right (filled with 0s so that the output is the same size as the input).

def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')

Setting of the first convolution layer

Prepare 32 5 * 5 weight patches. [Patch size, number of input channels, number of output channels]. Bias is prepared as many as the number of output channels.

W_conv1 = weight_variable([5,5,1,32])
b_conv1 = bias_variable([32])

Give the value of x

To apply a layer, first transform it into a 4d tensor in x, in the second and third dimensions corresponding to the width and height of the image, and in the final dimension corresponding to the number of color channels. tf.reshape (x, [-1, 28, 28, 1]) transforms the shape of the matrix. The last '1' indicates that it is a shade image.

x_image = tf.reshape(x, [-1, 28, 28, 1])

The result of folding the first layer

Convolve with x_image, weight tensor, bias, apply ReLU function, and finally apply maximum pool. This max_pool_2x2 method reduces the image size to 14x14.

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

Setting of the second convolution layer and the result of convolution

W_conv2 = weight_variable ([5,5,32,64]) has 64 5 * 5 patches of 32 types.

Since it is the second layer, the convolution calculation of h_pool1 and W_conv2

W_conv2 = weight_variable([5,5,32,64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

Bonding layer

W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

Dropout Apply dropouts before the read layer to reduce overfitting. We placeholders create a about the probability that a neuron's output will be retained during a dropout. This allows you to turn dropouts on during training and off during testing.

keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

Read layer

W_fc2 = weight_variable ([1024,10]) 1024 rows x 10 columns (establishment of numbers 0-9)

W_fc2 = weight_variable([1024,10])
b_fc2 = bias_variable([10])

y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

Model training and evaluation

cross_entropy = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv)) reduce_mean () takes the mean tf.nn.softmax_cross_entropy_with_logits (labels = y_, logits = y_conv) compares the correct label (y_) with the estimated value (y_conv)

Set the learning method with tf.train.AdamOptimizer (1e-4) .minimize (cross_entropy) This time AdamOptimizer

cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  for i in range(20000):
    batch = mnist.train.next_batch(50)
    if i % 100 == 0:
      train_accuracy = accuracy.eval(feed_dict={
          x: batch[0], y_: batch[1], keep_prob: 1.0})
      print('step %d, training accuracy %g' % (i, train_accuracy))
    train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

  print('test accuracy %g' % accuracy.eval(feed_dict={
      x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

Complete program

Here is a summary of the above flow

`mnist_cnn.py`


from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot = True)

import tensorflow as tf
sess = tf.InteractiveSession()

x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None,10])

W = tf.Variable(tf.zeros([783, 10]))
b = tf.Variable(tf.zeros([10]))

def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)
    
def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)
    
def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1,1,1,1], padding='SAME')
    
def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')
    
W_conv1 = weight_variable([5,5,1,32])
b_conv1 = bias_variable([32])

x_image = tf.reshape(x, [-1, 28, 28, 1])

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

W_conv2 = weight_variable([5,5,32,64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

W_fc2 = weight_variable([1024,10])
b_fc2 = bias_variable([10])

y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  for i in range(20000):
    batch = mnist.train.next_batch(50)
    if i % 100 == 0:
      train_accuracy = accuracy.eval(feed_dict={
          x: batch[0], y_: batch[1], keep_prob: 1.0})
      print('step %d, training accuracy %g' % (i, train_accuracy))
    train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

  print('test accuracy %g' % accuracy.eval(feed_dict={
      x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

`output`


step 0, training accuracy 0
step 100, training accuracy 0.9
step 200, training accuracy 0.9
~~~~~~~~~~~~~~~~Abbreviation~~~~~~~~~~~~~~~~~
step 19900, training accuracy 1
test accuracy 0.9916

99% !! The accuracy was improved because it was 92% last time.