Introduction

I tried TensorFlow tutorial (MNIST for beginners) in Cloud9 ~ Classification of handwritten images ~, I tried to implement simple machine learning I did. Next, I tried TensorFlow's "Deep MNIST for Experts". Since it is written for professionals, various methods are used, but it is easier to understand as you write the code. Here we will implement a convolutional neural network (CNN) that has a proven track record when classifying images.

environment

The environment is the same as last time. Cloud9 Python 2.7.6 Sample Codes : GitHub Environment construction is "Use TensorFlow in cloud integrated development environment Cloud9 ~ GetStarted ~" The basic usage of TesorFlow is "Use TensorFlow in cloud integrated development environment Cloud9-Basics of usage-" See

code

Same as last time, it is divided into two parts, the learning processing code and the handwritten data prediction code. First, let's look at the code of the learning process.

`nist_neural_train.py`


from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

# Define method
def weight_variable(shape, name):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial, name=name)

def bias_variable(shape, name):
  initial= tf.constant(0.1, shape=shape)
  return tf.Variable(initial, name=name)

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

# Download gz files to MNIST_data directory
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

# Initializing
sess = tf.InteractiveSession()

x = tf.placeholder(tf.float32, shape=[None, 28*28])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
x_image = tf.reshape(x, [-1, 28, 28, 1])

W_conv1 = weight_variable([5, 5, 1, 32], name="W_conv1")
b_conv1 = bias_variable([32], name="b_conv1")
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

W_conv2 = weight_variable([5, 5, 32, 64], name="W_conv2")
b_conv2 = bias_variable([64], name="b_conv2")
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

W_fc1 = weight_variable([7 * 7 * 64, 1024], name="W_fc1")
b_fc1 = bias_variable([1024], name="b_fc1")
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

W_fc2 = weight_variable([1024, 10], name="W_fc2")
b_fc2 = bias_variable([10], name="b_fc2")
y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

# Making model
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_conv, y_))

# Training
train_step = tf.train.GradientDescentOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.initialize_all_variables())
for i in range(20000):
  batch = mnist.train.next_batch(50)
  if i%100 == 0:
    train_accuracy = accuracy.eval(feed_dict={x:batch[0], y_:batch[1], keep_prob:1.0})
    print("step %d, training accuracy %g" %(i, train_accuracy))
  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

# Evaluating
#print("test accuracy %g" %accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

# Save train data
saver = tf.train.Saver()
saver.save(sess, 'param/neural.param')

The basic flow is the same as last time, but I will explain the difference.

def weight_variable(shape, name):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial, name=name)

def bias_variable(shape, name):
  initial= tf.constant(0.1, shape=shape)
  return tf.Variable(initial, name=name)

Definition of parameter initialization process. Last time, it was actually initialized with 0, but it seems better not to be 0. I am using a function called ReLU, which is a special function with a slope of 1 when it is a positive value and 0 when it is a negative value, and for that reason, those who have a small positive value as the initial value. Seems to be good.

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

Convolution (conv2d) and pooling (max_pool_2x2), which are the names of convolutional neural networks. The detailed processing is explained below.

Convolution layer

W_conv1 = weight_variable([5, 5, 1, 32], name="W_conv1")
b_conv1 = bias_variable([32], name="b_conv1")
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

It is a convolution layer. Of the [5, 5, 1, 32] of W_conv1, 5 * 5 represents the patch. A layer for capturing the characteristics of an image, which takes out a part (5 * 5) of the image and calculates the match status with that part for the image. By doing so, you can capture the features of the image. 1 is an input and represents one image data to be input. 32 is the output, and 32 types of patches will capture the characteristics of the image data.

The relu of h_conv1 is the ReLU function. If it is a negative value, it is 0, if it is a positive value, the slope is 1, that is, it is a function that uses the input value as it is. Since the slope is 1, it prevents the situation where the slope becomes too small or 0 and learning becomes impossible.

h_pool1 is called the pooling layer, and it is a process that aggregates the maximum value of 4 squares of 2 * 2 in the image into 1 square. The image was originally 28 * 28, which is a quarter of 14 * 14. Even if the image data deviates a little, it will be different data, but I think that the deviation is absorbed by aggregating by pooling. (I apologize if I'm wrong)

W_conv2 = weight_variable([5, 5, 32, 64], name="W_conv2")
b_conv2 = bias_variable([64], name="b_conv2")
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

W_fc1 = weight_variable([7 * 7 * 64, 1024], name="W_fc1")
b_fc1 = bias_variable([1024], name="b_fc1")
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

conv2 just re-implements the convolution layer described above. fc1 is just the same calculation as last time. However, at h_pool2_flat, the multidimensional array is converted to one dimension so that fc1 can be calculated.

Drop out

keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

Dropout means that some data is not used with the probability set by keep_prob. By doing so, you can prevent overfitting (a situation where the accuracy is high in the training data but not in the test data). I set the value of keep_prob during processing, but I am using 0.5 during training and 1.0 (without using dropout) for verification with test data.

Verification with test data

#print("test accuracy %g" %accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

It is a verification part with test data, but it is commented out. This is because cloud9 doesn't seem to have enough memory and an error occurs here. The dropout keep_prob described above is set to 1.0.

Execution of learning

You can understand the subsequent processing from the contents explained last time. When I performed the learning, cloud9 was very time consuming because it was free and had poor CPU and memory. It takes a few hours to half a day. Since the parameters are saved, the following data can be predicted immediately, but it was difficult when the learning had to be redone.

Predicting your handwritten data

When I tried to predict my handwritten data as before, it was 50%. Originally, the accuracy should increase, but on the contrary, it decreases. .. .. I don't know if the handwritten data is 1 and 0, or because there are only 10 handwritten data, I don't know if it happened to be off, but I feel like it's not good enough. ⇒ Preprocessing was required to predict handwritten data. For more information, please refer to the article Predicting your handwritten data with TensorFlow.

`mnist_neural.py`


from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import sys
import numpy as np
import parsebmp as pb

# Define method
def weight_variable(shape, name):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial, name=name)

def bias_variable(shape, name):
  initial= tf.constant(0.1, shape=shape)
  return tf.Variable(initial, name=name)

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

def whatisit(file, sess):
  print("File name is %s" % file)
  data = pb.parse_bmp(file)

  # Show bmp data
  for i in range(len(data)):
    sys.stdout.write(str(int(data[i])))
    if (i+1) % 28 == 0:
      print("")

  # Predicting
  d = np.array([data])
  x_image = tf.reshape(d, [-1, 28, 28, 1])
  h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
  h_pool1 = max_pool_2x2(h_conv1)
  h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
  h_pool2 = max_pool_2x2(h_conv2)
  h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
  h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
  h_fc1_drop = tf.nn.dropout(h_fc1, 1.0)
  y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2
  result = sess.run(y_conv)

  # Show result
  print(result)
  print(np.argmax(result, 1))

if __name__ == "__main__":
  # Restore parameters
  W = tf.Variable(tf.zeros([28*28, 10]), name="W")
  b = tf.Variable(tf.zeros([10]), name="b")
  W_conv1 = weight_variable([5, 5, 1, 32], name="W_conv1")
  b_conv1 = bias_variable([32], name="b_conv1")
  W_conv2 = weight_variable([5, 5, 32, 64], name="W_conv2")
  b_conv2 = bias_variable([64], name="b_conv2")
  W_fc1 = weight_variable([7 * 7 * 64, 1024], name="W_fc1")
  b_fc1 = bias_variable([1024], name="b_fc1")
  W_fc2 = weight_variable([1024, 10], name="W_fc2")
  b_fc2 = bias_variable([10], name="b_fc2")

  sess = tf.InteractiveSession()
  saver = tf.train.Saver()
  saver.restore(sess, 'param/neural.param')

  # My data
  whatisit("My_data/0.bmp", sess)
  whatisit("My_data/1.bmp", sess)
  whatisit("My_data/2.bmp", sess)
  whatisit("My_data/3.bmp", sess)
  whatisit("My_data/4.bmp", sess)
  whatisit("My_data/5.bmp", sess)
  whatisit("My_data/6.bmp", sess)
  whatisit("My_data/7.bmp", sess)
  whatisit("My_data/8.bmp", sess)
  whatisit("My_data/9.bmp", sess)

in conclusion

At first I couldn't understand it at all, but I deepened my understanding by studying in various books and the Web and implementing code. Next, I would like to think about the network and implement it myself.

Change log

--2018/06/12: Added about prediction of handwritten data --2017/03/28: New post

[PYTHON] I tried a convolutional neural network (CNN) with a tutorial on TensorFlow on Cloud9-Classification of handwritten images-