[PYTHON] I thought a little about TensorFlow's growing API

Intro

Finally, TensorFLow 1.0 was announced at the TensorFlow Dev Summit. There is talk of a test release of the Graph compiler XLA, but personally, I was wondering what the API (Application Programming Interface) configuration of TensorFlow would be.

(Excerpt from Google Developers Blog "Announcing TensorFlow 1.0".)

Higher-level API modules tf.layers, tf.metrics, and tf.losses - brought over from tf.contrib.learn after incorporating skflow and TF Slim

Recently, I was expecting version 1.0 to see how the High-level API, which had been showing a lot of trouble, would be organized, so I investigated the situation a little.

Early CNN model code

Previously, when dealing with the CNN (Convolutional Neural Network) model in a situation where there were not many High-level APIs, I prepared my own class and coded as follows.

#   my_lib_nn.py
#For example...  Convolution 2-D Layer
class Convolution2D(object):
    '''
      constructor's args:
          input     : input image (2D matrix)
          input_siz ; input image size
          in_ch     : number of incoming image channel
          out_ch    : number of outgoing image channel
          patch_siz : filter(patch) size
          weights   : (if input) (weights, bias)
    '''
    def __init__(self, input, input_siz, in_ch, out_ch, patch_siz, activation='relu'):
        self.input = input      
        self.rows = input_siz[0]
        self.cols = input_siz[1]
        self.in_ch = in_ch
        self.activation = activation
        
        wshape = [patch_siz[0], patch_siz[1], in_ch, out_ch]
                w_cv = tf.Variable(tf.truncated_normal(wshape, stddev=0.1), 
                            trainable=True)
        b_cv = tf.Variable(tf.constant(0.1, shape=[out_ch]), 
                            trainable=True)
        self.w = w_cv
        self.b = b_cv
        self.params = [self.w, self.b]
        
    def output(self):
        shape4D = [-1, self.rows, self.cols, self.in_ch]
        
        x_image = tf.reshape(self.input, shape4D)  # reshape to 4D tensor
        linout = tf.nn.conv2d(x_image, self.w, 
                  strides=[1, 1, 1, 1], padding='SAME') + self.b
        if self.activation == 'relu':
            self.output = tf.nn.relu(linout)
        elif self.activation == 'sigmoid':
            self.output = tf.sigmoid(linout)
        else:
            self.output = linout
        
        return self.output

The function of tf.nn.xxx () is used as a subcontracting library, but it is a method of creating and using a wrapper that makes it easy to use. It's easy to customize your own library, but you have to do the detailed maintenance yourself because of that. (It's not a big library, but ...)

After I knew ** Keras **, I chose "Should I use Keras?", But in consideration of familiarity, ease of detailed debugging, and flexibility, there were many coding styles that used the TensorFlow library directly.

TensorFlow Slim vs. tf.layers

"Slim" was the focus of attention from the perspective of "thin TensorFlow wapper". It seems that Qiita has also taken up some cases. The code for classifying MNIST using this is as follows.

import numpy as np
import tensorflow as tf
import tensorflow.contrib.slim as slim
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("../MNIST_data/", one_hot=True)

# Create the model
def my_nn(images, keep_prob):
   net = slim.layers.conv2d(images, 32, [5,5], scope='conv1')
   net = slim.layers.max_pool2d(net, [2,2], scope='pool1')
   net = slim.layers.conv2d(net, 64, [5,5], scope='conv2')
   net = slim.layers.max_pool2d(net, [2,2], scope='pool2')
   net = slim.layers.flatten(net, scope='flatten3')
   net = slim.layers.fully_connected(net, 1024, scope='fully_connected4')
   net = slim.layers.dropout(net, keep_prob)
   net = slim.layers.fully_connected(net, 10, activation_fn=None, 
                                        scope='fully_connected5')
   return net

def inference(x, y_, keep_prob):
    x_image = tf.reshape(x, [-1, 28, 28, 1])
    y_pred = my_nn(x_image, keep_prob)

    slim.losses.softmax_cross_entropy(y_pred, y_)
    total_loss = slim.losses.get_total_loss()
    correct_prediction = tf.equal(tf.argmax(y_pred, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    
    return total_loss, accuracy, y_pred

Nueral network model can be written in a short code in an easy-to-understand manner. Also, as used in the function "inference", the loss function could be written with the slim API. I had the impression that it was fairly easy to use.

Next, this time, I examined the ** tf.layers ** module prepared by TensorFlow 1.0. There was a good explanation in the API document, so I coded with reference to it.

** From Fig. TensorFlow API document (image excerpt) **

As mentioned in the excerpt from the above Google announcement, ** tf.contrib.layers ** is also mentioned in the document, but this time ** tf.layers ** is different, so be careful.

Below is the CNN code using ** tf.layers **.

import tensorflow as tf
from tensorflow.python.layers import layers
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("../MNIST_data/", one_hot=True)

# Create the model
def my_nn(images, drop_rate):
   net = tf.layers.conv2d(images, 32, [5,5], padding='same', 
                                activation=tf.nn.relu, name='conv1')
   net = tf.layers.max_pooling2d(net, pool_size=[2,2], strides=[2,2], 
                                name='pool1')
   net = tf.layers.conv2d(net, 64, [5,5], padding='same', 
                                activation=tf.nn.relu, name='conv2')
   net = tf.layers.max_pooling2d(net, pool_size=[2,2], strides=[2,2], 
                                name='pool2')
   net = tf.reshape(net, [-1, 7*7*64])
   net = tf.layers.dense(net, 1024, activation=tf.nn.relu, name='dense1')
   net = tf.layers.dropout(net, rate=drop_rate)
   net = tf.layers.dense(net, 10, activation=None, name='dense2')
   return net

def inference(x, y_, keep_prob):
    x_image = tf.reshape(x, [-1, 28, 28, 1])
    drop_rate = 1.0 - keep_prob
    y_pred = my_nn(x_image, drop_rate)
    
    loss = tf.losses.softmax_cross_entropy(y_, y_pred)
    correct_prediction = tf.equal(tf.argmax(y_pred, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    
    return loss, accuracy, y_pred

At first, I started with the slim code with the prejudice that it would be something like slim, but I was confused because there was a surprising difference in the function specifications. (Pay attention to the code in the "my_nn ()" function.)

Assuming that different function names (max_pool2d <-> max_pooling2d, fully_connected <-> dense) are accepted as "common", fine adjustments were required, such as different argument keywords and different argument defaults. .. As a point of particular concern (unforgivable), as a Dropout parameter, the specification that gives the ratio "keep_prob" that leaves the unit influence after processing has been changed to the specification that gives the ratio that omits (drops) the effect. This is the point. (As a countermeasure, I put a line of drop_rate = 1.0 --keep_prob.) It may be a point that is divided according to the programmer's" preference ", but I wanted you to consider compatibility with the past ...

This did not mean that the TensorFlow API was organized and tidy, and I had the impression that the current specifications were a little disappointing.

What should I do?

Considering the above situation, I will give you some options.

--Wait for ** Keras2 ** in preparation. Since the number of users is large, there are high expectations that the sophistication of the API will be continuously improved. ――In consideration of the fact that it has just appeared, we expect that ** tf.layers **, ** tf.metrics **, ** tf.losses ** will be more complete in the future. (Because it's open source, it's best to actively say "this is better" on GitHub.) --Let's examine another API. (Tf.contrib.layers, TFLearn, etc.) ――Do not abandon your class library and use it while maintaining it. (You can also take the strategy of incorporating the good points of the API you care about.)

Since "preference" is often reflected in details such as Dropout parameters (remaining ratio or discarding ratio), I feel that it is unavoidable to worry too much about "which is the best API". This time, I focused on CNN functions that handle images, but considering the flexible modeling capabilities of deep learning (for example, RNNs and generating models), I didn't care about the details of the API (as appropriate). It may be more constructive to follow a wide range of technical contents (while using them properly).

(If you have any opinions or advice, please comment.) (The programming environment at the time of writing is as follows: Python 3.5.2, TensorFlow 1.0.0)

References, Web site

TensorFlow API Document
https://www.tensorflow.org/api_docs/
Announcing TensorFlow 1.0 - Google Developers Blog
https://developers.googleblog.com/2017/02/announcing-tensorflow-10.html --TensorFlow Dev Summit Video --YouTube