[PYTHON] TensorFlow Deep MNIST for Experts Translation

Introduction

Translated tutorials for Beginners and actually used TensorFlow in previous and previous I did machine learning. This time I translated the tutorial for Expert.

Deep MNIST for Experts TensorFlow is a powerful library for performing large-scale numerical calculations. One of the best tasks is training and executing deep neural networks. In this tutorial we will learn the basic components of the TensorFlow model while building a deep convolutional MNIST classifier.

This introduction assumes that you are familiar with neural networks and MNIST datasets. If you can't have those backgrounds, check out the Introduction for Beginners (https://www.tensorflow.org/versions/master/tutorials/mnist/beginners/index.html). Be sure to install TensorFlow before you start.

Setup Before building our model, we first load the MNIST dataset and then start a TensorFlow session.

Load MNIST Data For your convenience, we will automatically download and import the MNIST dataset [script](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/examples/tutorials/mnist/input_data .py) is included. This creates a'MNIST_data' directory for storing data files.

import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

This mnist is a lightweight class that stores a set of training, validation, and testing as an array of NumPy. It also provides a function that iterates through a mini-batch of data, which we use below.

Start TensorFlow InteractiveSession TensorFlow relies on a highly effective C ++ backend for computing. The connection to this backend is called a session. The usual use of a TensorFlow program is to first create a graph and then start it in a session.

Here we instead use the convenient InteractiveSession class, which makes TensorFlow more flexible about how to build your code. This allows you to start the graph and plug in the process of building the Computation Graph (https://www.tensorflow.org/versions/master/get_started/basic_usage.html#the-computation-graph) To. This is especially useful when working in a two-way context like iPython. If you are not using InteractiveSession, you start a session and Start Graph (https://www.tensorflow.org/versions/master/get_started/basic_usage.html#launching-the-graph-in -a-session) You should build the whole computational graph before.

import tensorflow as tf
sess = tf.InteractiveSession()

Computation Graph

For efficient numerical calculations in Python, we generally use very effective code implemented in other languages, like NumPy, which does high processing like matrix multiplication outside of Python. Library. Unfortunately, there will still be a lot of overhead in switching to all of Python's processing. This overhead is especially bad if you want to calculate on the GPU or in a distributed way that would be expensive to transfer data.

TensorFlow also does its hard work outside of Python, but takes further steps to avoid this overhead. Instead of running one expensive process that is independent of Python, TensorFlow describes a graph of interacting processes that runs completely outside Python. This approach is similar to using Theano or Torch.

The role of the Python code is therefore to build this external computational graph and to dictate which part of the computational graph to run. For more details, see Calculation Graph in Basic Usage. You can look at the /get_started/basic_usage.html#the-computation-graph) section.

Build a Softmax Regression Model

In this section, we build a softmax regression model of one linear layer. In the next section, we extend this in the case of softmax regression of multi-layer convolutional networks.

Placeholders We start building computational graphs by creating nodes for image inputs and target output classes.

x = tf.placeholder("float", shape=[None, 784])
y_ = tf.placeholder("float", shape=[None, 10])

These x and y_ are not special values. Rather, they are placeholders, the values we enter when we ask TensorFlow to run the calculations, respectively.

The input image x consists of a floating point 2D tensor. Here we assign it the form [None, 784]. 783 is the number of dimensions of the MNIST image parallel to one row, indicating that the first dimension, None, can be of any size, corresponding to the batch size. The target output class y_ also consists of a 2D tensor. Each column is a one-hot 10-dimensional vector showing the numbers corresponding to the MNIST image.

The placeholder shape argument is optional, but it allows you to automatically catch bugs that result from tensor shapes that don't match TensorFlow.

Variables We now define weights W and bias b for our model. We can imagine dealing with these like additional inputs, but TensorFlow has a better way to deal with them. It is Variable. Variable is a value that exists in the TensorFlow calculation graph. In machine learning applications, the parameters of the model are usually Variable.

W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))

We pass initial values for those parameters by calling tf.Variable. In this case, we initialize both W and b as an all-zero tensor. W is a 784x10 matrix (because we have 784 input features and 10 outputs) and b is a 10-dimensional vector (because we have 10 classes).

Before Variables will be used in a session, they must be initialized by using the session. In this step, we take the initial values we have already clarified (for all zero tensors) and assign them their respective Variables. This can be done for all Variables in one go.

sess.run(tf.initialize_all_variables())

Predicted Class and Cost Function

We can now implement our regression model. It's just one line! We multiply the vectorized input image x with the weight matrix W, add the bias b, and calculate the softmax probabilities assigned to each class.

y = tf.nn.softmax(tf.matmul(x,W) + b)

The cost function minimized during training is briefly described. Our cost function is the cross entropy between the object and the model prediction.

cross_entropy = -tf.reduce_sum(y_*tf.log(y))

Note that tf.reduce_sum sums over all images in a mini-batch, not to mention all classes. We are calculating the cross entropy for the entire mini-batch.

Train the Model

Now that we have defined our model and training cost function, training with TensorFlow is simple. Because TensorFlow knows all computational graphs and uses auto-identification to find the cost gradient for each variable. TensorFlow has a variety of built-in optimization algorithms (https://www.tensorflow.org/versions/master/api_docs/python/train.html#optimizers). For example, we take steps of 0.01 length and use the steepest gradient descent to descend the cross entropy.

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

What TensorFlow actually did in this one line was to add new processing to the computational graph. The process, including those that calculate these gradients, calculates the parameter update steps and applies the updated steps to the parameters.

The returned process train_step applies gradient descent to update the parameters when it is executed. Model training will therefore be proficient by performing train_step many times.

for i in range(1000):
  batch = mnist.train.next_batch(50)
  train_step.run(feed_dict={x: batch[0], y_: batch[1]})

In each training iteration, we read 50 training samples. We then use feed_dict to replace the x and _y of the placeholder tensor with the training sample and perform the train_step process. Note that you can use feed_dict to swap any tensor in Atana's computational graph. Not just limited to placeholder.

Evaluate the Model

How to improve our model?

First we calculate where we predicted the correct label. tf.argmax is a very useful function that gives the highest input index of a tensor along an axis. For example, tf.argmax (_y, 1) is the true label, while tf.argmax (y, 1) is the most similar label for each input we think of. We can use tf.equal to see if our predictions match the truth.

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

This gives a list of booleans. To determine if the part is correct, we cast it to floating point and then take the average. For example, [True, False, True, True] becomes [1,0,1,1] and becomes 0.75.

accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

Finally, we evaluate our accuracy with test data. This should be about 91% correct.

print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

Build a Multilayer Convolutional Network

It's not good to get about 91% accuracy with MNIST. It's almost confusingly bad. In this section, we fix it and jump from a very simple model to something reasonably sophisticated, a small convolutional neural network. This gives about 99.2% accuracy. Not cutting-edge, but straightforward.

Weight initialization

To create this model, we need to create a lot of weights and biases. One generally initializes the weights with a small amount of noise due to symmetry breaking, preventing the gradient from becoming zero. Since we used Rectifier (ReLU) neurons, it is a good practice to initialize them with an initial bias that is slightly positive to prevent dead neurons. Instead of repeating this while we build the model, let's create two easy-to-use functions that do it for us.

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

Convolution and pooling TensorFlow also gives us a lot of flexibility in convolution and partial sampling. How to handle boundaries well? What is the size of our stride? In this example, we always choose the mediocre version. Our convolution uses one stride and is padded to 0 so that the output is the same size as the input. Our partial sampling is an old maximum partial sampling (layer) plan that exceeds 2x2 blocks. To keep our code clean, let's make these abstract processes functions.

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

First Convolutional Layer

We can now implement our first layer. It consists of convolution and occurs after max-pooling. Convolution calculates 32 features for a 5x5 patch. Its weight tensor has the form [5, 5, 1, 32]. The first two dimensions are the patch size, then the number of input channels, and finally the output channels. We also have a bias vector with elements for each output channel.

W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

To apply to layers, we first translate x into a 4D tensor with the second and third dimensions corresponding to the width and height of the image and the last dimension corresponding to the number of color channels. Remake.

x_image = tf.reshape(x, [-1,28,28,1])

We then make a convolution of the x_image and the weight tensor, bias it, apply the ReLU function, and finally calculate the max pool.

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

Second Convolutional Layer

Instead of building a deep network, we stack several layers of this type. The second layer has 64 features for each 5x5 patch.

W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

Densely Connected Layer

Now that the image size has been reduced to 7x7, we add a fully connected layer with 1024 neurons that allow processing of all images. We transform the tensor from a partial sampling layer to a batch of vectors, multiply it by a weight matrix, bias it, and apply ReLU.

W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

Dropout To reduce over-application, we apply a dropout before the read-out layer. We make a placeholder for the probability of keeping the neuron's output during the dropout. This allows us to drop out during training and quit running during testing. TensorFlow's tf.nn.dropout operation automatically scales the neuron output in addition to masking them, and the dropout runs without any additional scaling.

keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

Readout Layer

Finally, we add a softmax layer as well as one layer of softmax regression above.

W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

Train and Evaluate the Model

How good is this model? For training and evaluation, we use almost the same code as one layer of the simple Softmax network above. The differences are as follows. We transform the steepest gradient descent optimization program into a more sophisticated ADAM optimization program. We include an additional parameter keep_prob in deed_dict to control the dropout rate. And we add logging to every 100 iterations in the training process.

cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run(tf.initialize_all_variables())
for i in range(20000):
  batch = mnist.train.next_batch(50)
  if i%100 == 0:
    train_accuracy = accuracy.eval(feed_dict={
        x:batch[0], y_: batch[1], keep_prob: 1.0})
    print("step %d, training accuracy %g"%(i, train_accuracy))
  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

print("test accuracy %g"%accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

The accuracy of the final test set after running this code should be about 99.2%.

We learned how to build, train, and evaluate a fairly sophisticated model of deep learning with TensorFlow quickly and easily.

in conclusion

That's all for translation. It's more complicated than the tutorial for beginners, but it's similar in that each layer is weighted, biased, and the function applied. Next, I would like to actually execute the code while understanding this content better.

Recommended Posts

TensorFlow Deep MNIST for Experts Translation
TensorFlow MNIST For ML Beginners Translation
[Roughly translate TensorFlow Tutorial into Japanese] 2. Deep MNIST For Experts
[Explanation for beginners] TensorFlow tutorial Deep MNIST
[Explanation for beginners] TensorFlow tutorial MNIST (for beginners)
TensorFlow Tutorial MNIST For ML Beginners
TensorFlow Tutorial -MNIST For ML Beginners
Supplementary notes for TensorFlow MNIST For ML Beginners
Beginners read "Introduction to TensorFlow 2.0 for Experts"
TensorFlow Tutorial-Mandelbrot Set (Translation)
Code for TensorFlow MNIST Begginer / Expert with Japanese comments
TensorFlow Tutorial-TensorFlow Mechanics 101 (Translation)
I tried the MNIST tutorial for beginners of tensorflow.
Enable GPU for tensorflow
TensorFlow Tutorial-Image Recognition (Translation)
[Roughly translate TensorFlow Tutorial into Japanese] 1. MNIST For ML Beginners
Japanese translation of public teaching materials for Deep learning nanodegree
Try deep learning with TensorFlow
Installation notes for TensorFlow for Windows
TensorFlow Tutorial-MNIST Data Download (Translation)
TensorFlow Tutorial-Sequence Transformation Model (Translation)
TensorFlow Tutorial-Partial Differential Equations (Translation)
Deep learning for compound formation?
TensorFlow Tutorial-Convolutional Neural Network (Translation)
Try TensorFlow MNIST with RNN