In the above, I mentioned that the document of Deep Learning Framework "TensorFlow" is difficult. In this article, I would like to try to solve MNIST (Handwritten Number Classification Problem) using a two-layer network of TensorFlow.
Looking through the TensorFlow Tutorial, it seems that there is a fairly technical level jump, with "MNIST for Beginners" first and then "Deep MNIST for Experts".
--For Beginners ... Softmax Regression (multi-class by softmax function, logistic regression) -(Basics of network model, MLP (Multi Layer Perceptron Neural Network)?) --For Experts ... Convolutional Neural Network
Wouldn't it be better to understand it step by step by inserting a multi-layer network (MLP) model instead of suddenly CNN (convolutional neural network)? In this article, I created a Tutorial code to fill such a gap. The features are as follows.
--A model of a two-layer network (MLP). ((Input +) Hidden layer + Output layer) --The level of technique is the range that appears in the online course "Coursera Machine Learning (Stanford)". --The calculation technique "Dropout" for selecting units is not used. (Note that the title of this article, "Don't drop out," depends on "Don't use" Dropout ".")
(What you are doing may be close to the sample code "mnist.py" for the fully connected model in the GitHub TensorFlow repository, but we aimed for a short code that is easy for you to understand.)
Below, I would like to look at the code for each part.
import tensorflow as tf
# Import data
import input_data
mnist = input_data.read_data_sets("../MNIST_data/", one_hot=True)
First of all, import of the required module. It may have been better to read "input_data.py", but since it contains various functions, this time without analyzing the contents. I decided to use this as a black box.
Next is the preparation of variables to be used.
# Variables
x = tf.placeholder("float", [None, 784])
y_ = tf.placeholder("float", [None, 10])
w_h = tf.Variable(tf.random_normal([784, 625], mean=0.0, stddev=0.05))
w_o = tf.Variable(tf.random_normal([625, 10], mean=0.0, stddev=0.05))
b_h = tf.Variable(tf.zeros([625]))
b_o = tf.Variable(tf.zeros([10]))
x
and y_
are placeholders for training data (test data), and w_h, w_o, b_h, b_o
are learning parameters (weight and bias, hidden layer and output layer). Random Initialize is done with "tf.random_normal ()" which generates random numbers of normal distribution. The parameters of the random numbers are set to mean = 0.0 and stddev = 0.05 according to the rough standard of "small value". (For bias, initialize with zero.)
# Create the model
def model(X, w_h, b_h, w_o, b_o):
h = tf.sigmoid(tf.matmul(X, w_h) + b_h)
pyx = tf.nn.softmax(tf.matmul(h, w_o) + b_o)
return pyx
y_hypo = model(x, w_h, b_h, w_o, b_o)
# Cost Function basic term
cross_entropy = -tf.reduce_sum(y_*tf.log(y_hypo))
This is the important part of this code, the part that describes the neural network model.
The hidden layer calculates a linear predictor from the value of the input layer and puts it in the Sigmoid function.
\textbf{u} ^{(h)} = \textbf{w} ^{(h)} \textbf{z} ^{(i)} + \textbf{b}^{(h)}
\textbf{z} ^{(h)} = f^{(h)}(\textbf{u}^{(h)})
f^{(h)} \ : \ Sigmoid()\ ...\ \texttt{activation function}
The output layer calculates a linear predictor from the value of the hidden layer and puts it in the Softmax function.
\textbf{u} ^{(o)} = \textbf{w} ^{(o)} \textbf{z} ^{(h)} + \textbf{b}^{(o)}
\textbf{z} ^{(o)} = f^{(o)} (\textbf{u} ^{(o)})
f^{(o)} \ :\ Softmax()\ ...\ \texttt{activation function}
(The above is the contents of def model ())
Calculate the value y_hypo
of your model with this model, and find the cross entropy value together with the label y_
of the training data. (This is the main part of the cost function.)
Next, the term of regularization is calculated.
# Regularization terms (weight decay)
L2_sqr = tf.nn.l2_loss(w_h) + tf.nn.l2_loss(w_o)
lambda_2 = 0.01
For the regularization term, we used the squared norm (L2_sqr
) (weight attenuation). TensorFlow supports "tf.nn.l2_loss ()" to calculate this.
# the loss and accuracy
loss = cross_entropy + lambda_2 * L2_sqr
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss)
correct_prediction = tf.equal(tf.argmax(y_hypo,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
This is where the definition of the optimizer and the related numerical values are calculated. The optimizer evaluates a cost function with a regular term added. The gradient descent method (Gradient Descent Optimize) was selected, and its learning rate was set to 0.001. In addition, the formula for determining the classification result and calculating its accuracy is described.
# Train
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
print('Training...')
for i in range(20001):
batch_xs, batch_ys = mnist.train.next_batch(100)
train_step.run({x: batch_xs, y_: batch_ys})
if i % 2000 == 0:
train_accuracy = accuracy.eval({x: batch_xs, y_: batch_ys})
print(' step, accurary = %6d: %6.3f' % (i, train_accuracy))
After initializing the variables, a session is started and the parameters are learned using the training data. This time, the calculation of the predetermined number of iterations (20,000 times +) is performed without performing convergence test or early stopping.
When the training is completed, the accuracy of the classifier is calculated using the test data.
# (with tf.Session() as sess:Will be inside)
# Test trained model
print('accuracy = ', accuracy.eval({x: mnist.test.images, y_: mnist.test.labels}))
The code explained above is summarized and posted again. (Approximately 60 lines of code.)
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
# Import data
import input_data
mnist = input_data.read_data_sets("../MNIST_data/", one_hot=True)
# Variables
x = tf.placeholder("float", [None, 784])
y_ = tf.placeholder("float", [None, 10])
w_h = tf.Variable(tf.random_normal([784, 625], mean=0.0, stddev=0.05))
w_o = tf.Variable(tf.random_normal([625, 10], mean=0.0, stddev=0.05))
b_h = tf.Variable(tf.zeros([625]))
b_o = tf.Variable(tf.zeros([10]))
# Create the model
def model(X, w_h, b_h, w_o, b_o):
h = tf.sigmoid(tf.matmul(X, w_h) + b_h)
pyx = tf.nn.softmax(tf.matmul(h, w_o) + b_o)
return pyx
y_hypo = model(x, w_h, b_h, w_o, b_o)
# Cost Function basic term
cross_entropy = -tf.reduce_sum(y_*tf.log(y_hypo))
# Regularization terms (weight decay)
L2_sqr = tf.nn.l2_loss(w_h) + tf.nn.l2_loss(w_o)
lambda_2 = 0.01
# the loss and accuracy
loss = cross_entropy + lambda_2 * L2_sqr
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss)
correct_prediction = tf.equal(tf.argmax(y_hypo,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
# Train
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
print('Training...')
for i in range(20001):
batch_xs, batch_ys = mnist.train.next_batch(100)
train_step.run({x: batch_xs, y_: batch_ys})
if i % 2000 == 0:
train_accuracy = accuracy.eval({x: batch_xs, y_: batch_ys})
print(' step, accurary = %6d: %6.3f' % (i, train_accuracy))
# Test trained model
print('accuracy = ', accuracy.eval({x: mnist.test.images, y_: mnist.test.labels}))
(Note: The first three lines from __future__ ...
are statements for compatibility with Python-3.)
The situation where this code is executed is as follows.
Training...
step, accurary = 0: 0.130
step, accurary = 2000: 0.900
step, accurary = 4000: 0.910
step, accurary = 6000: 0.930
step, accurary = 8000: 0.920
step, accurary = 10000: 0.960
step, accurary = 12000: 0.950
step, accurary = 14000: 0.950
step, accurary = 16000: 0.960
step, accurary = 18000: 0.960
step, accurary = 20000: 0.960
accuracy = 0.9546
The accuracy of test data classification was 95.46%. As expected, the value was almost intermediate between the regression calculation value by the Softmax function (91%) and the accuracy by the convolution neural network (99.2%). (There is a feeling of aiming a little.)
--Classification in other datasets. (It means that the data is other than "MNIST".) --Graph Visualization by "TensorBoard". --Use of other optimizers (AdaGrad, Adam, etc.), performance check. --Investigation of the impact of network configuration (number of layers x number of units).
(I'm still studying, but I hope I can share the TensorFlow code while trying it out little by little.)
--TensorFlow documentation http://www.tensorflow.org/ --Deep learning (Kodansha Machine Learning Professional Series) http://www.kspub.co.jp/book/detail/1529021.html