[PYTHON] Compare raw TensorFlow with tf.contrib.learn and Keras

What is this?

Announcement at DevFest Tokyo 2016 -happy-engineers-life) I received a request to see the whole sample code, so I will expose it.

What I wanted to do

DSL with raw TensorFlow and high-level API version of TensorFlow tf.contrib.learn and TensorFlow as backend With Keras that can describe the network like that, there was nothing that could be seen side by side as to how the same data and the same method would be used to make the difference. I wanted to. After all, when I looked at each tutorial, I was confused because I was doing something subtly different, or I was confused.

I didn't find anything that was compared under similar conditions in my research, so I decided to keep a record so that people would be less confused. For the time being, I will prepare other things based on the one in Tutorial of tf.contrib.learn.

Aligned conditions

-Use the flower (iris) data that often appears as an example in statistical analysis or machine learning called iris dataset. --Borrowed from tf.contrib.learn.datasets.base.load_iris () --Middle layer 3, each layer is 10, 20, 30 --The activation function is ReLu --The output layer is softmax --Division of training data and test data is done with the help of sklearn.cross_validation ()

Things that are not aligned

--Optimizer --Setting the initial value of the network

After that, I don't think there are many other things, but I feel that they are almost complete.

What you haven't done

--Slim support

There seemed to be a request, but I haven't arrived due to time constraints.

Comparison

Then it is a comparison immediately

Raw TensorFlow version

――Since you do all the processes yourself, it looks good to understand that you are doing it properly. --The code in mnist tutorial has too many helper functions for it and is provided by TensorFlow itself. I can't tell what they're doing, and when I try to do it, there are many tutorial-only ones that are quite sad. ――So I prepared a helper function like that.

import tensorflow as tf
import numpy as np
from sklearn import cross_validation

# 1-function for hot vector generation
def one_hot_labels(labels):
    return np.array([
        np.where(labels == 0, [1], [0]),
        np.where(labels == 1, [1], [0]),
        np.where(labels == 2, [1], [0])
    ]).T

#Get data randomly with the specified batch size
def next_batch(data, label, batch_size):
    perm = np.arange(data.shape[0])
    np.random.shuffle(perm)
    return data[perm][:batch_size], label[perm][:batch_size]

#Preparation of training data
iris = tf.contrib.learn.datasets.base.load_iris()

train_x, test_x, train_y, test_y = cross_validation.train_test_split(
    iris.data, iris.target, test_size=0.2
)

#Input layer
x = tf.placeholder(tf.float32, [None, 4], name='input')

#1st layer
W1 = tf.Variable(tf.truncated_normal([4, 10], stddev=0.5, name='weight1'))
b1 = tf.Variable(tf.constant(0.0, shape=[10], name='bias1'))
h1 = tf.nn.relu(tf.matmul(x,W1) + b1)

#2nd layer
W2 = tf.Variable(tf.truncated_normal([10, 20], stddev=0.5, name='weight2'))
b2 = tf.Variable(tf.constant(0.0, shape=[20], name='bias2'))
h2 = tf.nn.relu(tf.matmul(h1,W2) + b2)

#Layer 3
W3 = tf.Variable(tf.truncated_normal([20, 10], stddev=0.5, name='weight3'))
b3 = tf.Variable(tf.constant(0.0, shape=[10], name='bias3'))
h3 = tf.nn.relu(tf.matmul(h2,W3) + b3)

#Output layer
W4 = tf.Variable(tf.truncated_normal([10, 3], stddev=0.5, name='weight4'))
b4 = tf.Variable(tf.constant(0.0, shape=[3], name='bias4'))
y = tf.nn.softmax(tf.matmul(h3,W4) + b4)

#Ideal output value
y_ = tf.placeholder(tf.float32, [None, 3], name='teacher_signal')

#Comparison with ideal output value
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

#What is called learning processing
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
with tf.Session() as sess:
    sess.run(tf.initialize_all_variables())
    for i in range(2000):
        #Learning process
        batch_size = 100
        batch_train_x, batch_train_y = next_batch(train_x, train_y, batch_size)
        sess.run(train_step, feed_dict={x: batch_train_x, y_: one_hot_labels(batch_train_y)})

    #Evaluation of learning results
    correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    print(sess.run(accuracy, feed_dict={x: test_x, y_: one_hot_labels(test_y)}))

tf.contrib.learn version

――Compared to raw TensorFlow, it's so easy that you can feel the rhythm! -If you read API, it seems that you can set ʻoptimizer, dropout` and various other things. , I think I can do more than I expected

import tensorflow as tf
from sklearn import cross_validation

#Preparation of training data
iris = tf.contrib.learn.datasets.base.load_iris()
train_x, test_x, train_y, test_y = cross_validation.train_test_split(
    iris.data, iris.target, test_size=0.2
)

#Teach that all features are real numbers
feature_columns = [tf.contrib.layers.real_valued_column("", dimension=4)]

#3-layer DNN
#If nothing is specified, the activation function seems to select ReLU.
classifier = tf.contrib.learn.DNNClassifier(feature_columns=feature_columns,
                                            hidden_units=[10, 20, 10],
                                            n_classes=3,
                                            model_dir="./iris_model")
#Model fitting
classifier.fit(x=train_x,
               y=train_y,
               steps=2000,
               batch_size=50)

#Evaluation of accuracy
print(classifier.evaluate(x=test_x, y=test_y)["accuracy"])

Keras version

――It's completely different ――But if you build a network experimentally, it looks much better than raw TensorFlow.

import tensorflow as tf
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from sklearn import cross_validation

#Preparation of input data
iris = tf.contrib.learn.datasets.base.load_iris()
train_x, test_x, train_y, test_y = cross_validation.train_test_split(
    iris.data, iris.target, test_size=0.2
)

#Model definition
model = Sequential()

#Network definition
model.add(Dense(input_dim=4, output_dim=10))
model.add(Activation('relu'))
model.add(Dense(input_dim=10, output_dim=20))
model.add(Activation('relu'))
model.add(Dense(input_dim=20, output_dim=10))
model.add(Activation('relu'))
model.add(Dense(output_dim=3))
model.add(Activation('softmax'))

#Network compilation
model.compile(loss = 'sparse_categorical_crossentropy',
              optimizer = 'sgd',
              metrics = ['accuracy'])

#Learning process
model.fit(train_x, train_y, nb_epoch = 2000, batch_size = 100)

#Evaluation of learning results
loss, metrics = model.evaluate(test_x, test_y)

Arbitrary consideration

It looks like Pat looks like the following. --If you work experimentally, Keras ――If you have decided what to do and can realize what you want to do with the provided API, tf.learn --If you can't reproduce what you made with Keras with tf.learn, or if you already have an image of what you want to implement, raw TensorFlow (but debugging may be difficult)

TODO that is very likely not to be done

--Deeper digging --Comparison with slim