[PYTHON] TensorFlow Tutorial Code to Don't Fall Out

In the above, I mentioned that the document of Deep Learning Framework "TensorFlow" is difficult. In this article, I would like to try to solve MNIST (Handwritten Number Classification Problem) using a two-layer network of TensorFlow.

Motivation-I want to bridge the gap between for Beginners and for Experts

Looking through the TensorFlow Tutorial, it seems that there is a fairly technical level jump, with "MNIST for Beginners" first and then "Deep MNIST for Experts".

--For Beginners ... Softmax Regression (multi-class by softmax function, logistic regression) -(Basics of network model, MLP (Multi Layer Perceptron Neural Network)?) --For Experts ... Convolutional Neural Network

Wouldn't it be better to understand it step by step by inserting a multi-layer network (MLP) model instead of suddenly CNN (convolutional neural network)? In this article, I created a Tutorial code to fill such a gap. The features are as follows.

--A model of a two-layer network (MLP). ((Input +) Hidden layer + Output layer) --The level of technique is the range that appears in the online course "Coursera Machine Learning (Stanford)". --The calculation technique "Dropout" for selecting units is not used. (Note that the title of this article, "Don't drop out," depends on "Don't use" Dropout ".")

(What you are doing may be close to the sample code "mnist.py" for the fully connected model in the GitHub TensorFlow repository, but we aimed for a short code that is easy for you to understand.)

Code description

Below, I would like to look at the code for each part.

import tensorflow as tf

# Import data
import input_data
mnist = input_data.read_data_sets("../MNIST_data/", one_hot=True)

First of all, import of the required module. It may have been better to read "input_data.py", but since it contains various functions, this time without analyzing the contents. I decided to use this as a black box.

Next is the preparation of variables to be used.

# Variables
x = tf.placeholder("float", [None, 784])
y_ = tf.placeholder("float", [None, 10])

w_h = tf.Variable(tf.random_normal([784, 625], mean=0.0, stddev=0.05))
w_o = tf.Variable(tf.random_normal([625, 10], mean=0.0, stddev=0.05))
b_h = tf.Variable(tf.zeros([625]))
b_o = tf.Variable(tf.zeros([10]))

x and y_ are placeholders for training data (test data), and w_h, w_o, b_h, b_o are learning parameters (weight and bias, hidden layer and output layer). Random Initialize is done with "tf.random_normal ()" which generates random numbers of normal distribution. The parameters of the random numbers are set to mean = 0.0 and stddev = 0.05 according to the rough standard of "small value". (For bias, initialize with zero.)

# Create the model
def model(X, w_h, b_h, w_o, b_o):
    h = tf.sigmoid(tf.matmul(X, w_h) + b_h)
    pyx = tf.nn.softmax(tf.matmul(h, w_o) + b_o)
    
    return pyx

y_hypo = model(x, w_h, b_h, w_o, b_o)

# Cost Function basic term
cross_entropy = -tf.reduce_sum(y_*tf.log(y_hypo))

This is the important part of this code, the part that describes the neural network model.

The hidden layer calculates a linear predictor from the value of the input layer and puts it in the Sigmoid function.

\textbf{u} ^{(h)} = \textbf{w} ^{(h)} \textbf{z} ^{(i)} + \textbf{b}^{(h)}

\textbf{z} ^{(h)} = f^{(h)}(\textbf{u}^{(h)})

f^{(h)}  \ : \ Sigmoid()\ ...\ \texttt{activation function}

The output layer calculates a linear predictor from the value of the hidden layer and puts it in the Softmax function.

\textbf{u} ^{(o)} = \textbf{w} ^{(o)} \textbf{z} ^{(h)} + \textbf{b}^{(o)}

\textbf{z} ^{(o)} = f^{(o)} (\textbf{u} ^{(o)})

f^{(o)} \ :\ Softmax()\ ...\ \texttt{activation function}

(The above is the contents of def model ())

Calculate the value y_hypo of your model with this model, and find the cross entropy value together with the label y_ of the training data. (This is the main part of the cost function.)

Next, the term of regularization is calculated.

# Regularization terms (weight decay)
L2_sqr = tf.nn.l2_loss(w_h) + tf.nn.l2_loss(w_o)
lambda_2 = 0.01

For the regularization term, we used the squared norm (L2_sqr) (weight attenuation). TensorFlow supports "tf.nn.l2_loss ()" to calculate this.

# the loss and accuracy
loss = cross_entropy + lambda_2 * L2_sqr
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss)
correct_prediction = tf.equal(tf.argmax(y_hypo,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

This is where the definition of the optimizer and the related numerical values are calculated. The optimizer evaluates a cost function with a regular term added. The gradient descent method (Gradient Descent Optimize) was selected, and its learning rate was set to 0.001. In addition, the formula for determining the classification result and calculating its accuracy is described.

# Train
init = tf.initialize_all_variables()

with tf.Session() as sess:
    sess.run(init)
    print('Training...')
    for i in range(20001):
        batch_xs, batch_ys = mnist.train.next_batch(100)
        train_step.run({x: batch_xs, y_: batch_ys})
        if i % 2000 == 0:
            train_accuracy = accuracy.eval({x: batch_xs, y_: batch_ys})
            print('  step, accurary = %6d: %6.3f' % (i, train_accuracy))

After initializing the variables, a session is started and the parameters are learned using the training data. This time, the calculation of the predetermined number of iterations (20,000 times +) is performed without performing convergence test or early stopping.

When the training is completed, the accuracy of the classifier is calculated using the test data.

# （with tf.Session() as sess:Will be inside)

    # Test trained model
    print('accuracy = ', accuracy.eval({x: mnist.test.images, y_: mnist.test.labels}))

Code reposting and execution status

The code explained above is summarized and posted again. (Approximately 60 lines of code.)

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf

# Import data
import input_data
mnist = input_data.read_data_sets("../MNIST_data/", one_hot=True)

# Variables
x = tf.placeholder("float", [None, 784])
y_ = tf.placeholder("float", [None, 10])

w_h = tf.Variable(tf.random_normal([784, 625], mean=0.0, stddev=0.05))
w_o = tf.Variable(tf.random_normal([625, 10], mean=0.0, stddev=0.05))
b_h = tf.Variable(tf.zeros([625]))
b_o = tf.Variable(tf.zeros([10]))

# Create the model
def model(X, w_h, b_h, w_o, b_o):
    h = tf.sigmoid(tf.matmul(X, w_h) + b_h)
    pyx = tf.nn.softmax(tf.matmul(h, w_o) + b_o)
    
    return pyx

y_hypo = model(x, w_h, b_h, w_o, b_o)

# Cost Function basic term
cross_entropy = -tf.reduce_sum(y_*tf.log(y_hypo))

# Regularization terms (weight decay)
L2_sqr = tf.nn.l2_loss(w_h) + tf.nn.l2_loss(w_o)
lambda_2 = 0.01

# the loss and accuracy
loss = cross_entropy + lambda_2 * L2_sqr
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss)
correct_prediction = tf.equal(tf.argmax(y_hypo,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

# Train
init = tf.initialize_all_variables()

with tf.Session() as sess:
    sess.run(init)
    print('Training...')
    for i in range(20001):
        batch_xs, batch_ys = mnist.train.next_batch(100)
        train_step.run({x: batch_xs, y_: batch_ys})
        if i % 2000 == 0:
            train_accuracy = accuracy.eval({x: batch_xs, y_: batch_ys})
            print('  step, accurary = %6d: %6.3f' % (i, train_accuracy))

    # Test trained model
    print('accuracy = ', accuracy.eval({x: mnist.test.images, y_: mnist.test.labels}))

(Note: The first three lines from __future__ ... are statements for compatibility with Python-3.)

The situation where this code is executed is as follows.

Training...
  step, accurary =      0:  0.130
  step, accurary =   2000:  0.900
  step, accurary =   4000:  0.910
  step, accurary =   6000:  0.930
  step, accurary =   8000:  0.920
  step, accurary =  10000:  0.960
  step, accurary =  12000:  0.950
  step, accurary =  14000:  0.950
  step, accurary =  16000:  0.960
  step, accurary =  18000:  0.960
  step, accurary =  20000:  0.960
accuracy =  0.9546

The accuracy of test data classification was 95.46%. As expected, the value was almost intermediate between the regression calculation value by the Softmax function (91%) and the accuracy by the convolution neural network (99.2%). (There is a feeling of aiming a little.)

What you want to do as the next step

--Classification in other datasets. (It means that the data is other than "MNIST".) --Graph Visualization by "TensorBoard". --Use of other optimizers (AdaGrad, Adam, etc.), performance check. --Investigation of the impact of network configuration (number of layers x number of units).

(I'm still studying, but I hope I can share the TensorFlow code while trying it out little by little.)

References (web site)

--TensorFlow documentation http://www.tensorflow.org/ --Deep learning (Kodansha Machine Learning Professional Series) http://www.kspub.co.jp/book/detail/1529021.html

Newmu Theano-Tutorials - GitHub https://github.com/Newmu/Theano-Tutorials --Try regression with TensorFlow --Qiita http://qiita.com/syoamakase/items/db883d7ebad7a2220233 ――First TensorFlow ――Linear regression as an introduction ―― Qiita http://qiita.com/TomokIshii/items/f355d8e87d23ee8e0c7a