[PYTHON] I tried porting the code written for TensorFlow to Theano

Originally, I was studying the framework "Theano" for deep learning, but recently I was interested in "TensorFlow" released last month (2015/11) and recently used it as the main. While touching TensorFlow, I felt that "this is close to" Theano ", this is quite different", but to confirm the difference between the two, I ported a simple code from "TensorFlow" to "Theano". (Usually, I think the direction is opposite.)

Summary: "Theano" vs. "TensorFlow"

First, we compare the outlines of the two frameworks.

item Theano TensorFlow
Development entity academic(University of Montreal) Company(Google)
Publication year About 2010(?) 2015
Tensor operation support support
numpy 'basic'level function support support
Automatic differentiation(Graph Transformation) support support
GPU arithmetic support support
Graph Visualization support (There is for the time being) support (You know Tensorboard)
Optimizer not support (built-Means not in) Various support
Neural Network functions support (various) support

In terms of functionality, there are differences in the last part of the table above. In Theano, you have to prepare the details yourself, but in ThenorFlow, you have the impression that various library functions are prepared from the beginning. (For those who do not want to program from the details in Theano, there is a high-performance library based on Theano, such as Pylearn2.)

Compare by code (modeling Neural Network)

Here, you can write code in "Theano" and "ThensorFlow" in much the same way. This is an excerpt from the MLP (Multi-layer Perceptron) code that performs multi-class classification introduced in Previous article.

** TensorFlow version **

# Hidden Layer
class HiddenLayer(object):
    def __init__(self, input, n_in, n_out):
        self.input = input
    
        w_h = tf.Variable(tf.random_normal([n_in,n_out],mean=0.0,stddev=0.05))
        b_h = tf.Variable(tf.zeros([n_out]))
     
        self.w = w_h
        self.b = b_h
        self.params = [self.w, self.b]
    
    def output(self):
        linarg = tf.matmul(self.input, self.w) + self.b
        self.output = tf.nn.relu(linarg)  # switch sigmoid() to relu()
        
        return self.output

# Read-out Layer
class ReadOutLayer(object):
    def __init__(self, input, n_in, n_out):
        self.input = input
        
        w_o = tf.Variable(tf.random_normal([n_in,n_out],mean=0.0,stddev=0.05))
        b_o = tf.Variable(tf.zeros([n_out]))
       
        self.w = w_o
        self.b = b_o
        self.params = [self.w, self.b]
    
    def output(self):
        linarg = tf.matmul(self.input, self.w) + self.b
        self.output = tf.nn.softmax(linarg)  

        return self.output

** Theano version **

# Hidden Layer
class HiddenLayer(object):
    def __init__(self, input, n_in, n_out):
        self.input = input
    
        w_h = theano.shared(floatX(np.random.standard_normal([n_in, n_out])) 
                             * 0.05) 
        b_h = theano.shared(floatX(np.zeros(n_out)))
   
        self.w = w_h
        self.b = b_h
        self.params = [self.w, self.b]
    
    def output(self):
        linarg = T.dot(self.input, self.w) + self.b
        # self.output = T.nnet.relu(linarg)
        self.output = T.nnet.sigmoid(linarg)
        
        return self.output

# Read-out Layer
class ReadOutLayer(object):
    def __init__(self, input, n_in, n_out):
        self.input = input
        
        w_o = theano.shared(floatX(np.random.standard_normal([n_in,n_out]))
                             * 0.05)
        b_o = theano.shared(floatX(np.zeros(n_out)))
       
        self.w = w_o
        self.b = b_o
        self.params = [self.w, self.b]
    
    def output(self):
        linarg = T.dot(self.input, self.w) + self.b
        self.output = T.nnet.softmax(linarg)  

        return self.output

If you don't look closely, the difference is "just changed" tf. "To" T. "". Regarding the activation function, I thought that the later TenfsorFlow would be more convenient because of "tf.nn.relu ()" etc., but Theano also supports relu () (Rectified Linear Unit) from ver.0.7.1. It seems. (The programs used in this article are Python ver.2.7.10, TensorFlow ver.0.5.0, Theano ver.0.7.0.)

The Softmax function is, of course, supported by both.

Code comparison (optimization process)

Here, there is a difference between the two (TensorFlow vs. Theano). In TensorFlow, the Optimizer class library is extensive, while in Theano, you need to prepare the Optimizer class library yourself.

** TensorFlow version (Adagrad Optimzer usage example) **

    # Train
    optimizer = tf.train.AdagradOptimizer(0.01)
    train_op = optimizer.minimize(loss)
    
    init = tf.initialize_all_variables()

    with tf.Session() as sess:
        sess.run(init)
        print('Training...')
        for i in range(10001):
            train_op.run({x: train_x, y_: train_y})
            if i % 1000 == 0:                # echo status on screen
                train_accuracy = accuracy.eval({x: train_x, y_: train_y})
                print(' step, accurary = %6d: %8.3f' % (i, train_accuracy))

In TensorFlow, specify optimizer and run Session. The flow of supplying Train Data in the form of Feed_dict in Seesion.

** Theano version (example of implementing Adagrad) ** I intended to write how to call a function as close as possible to the TensorFlow code, but it became as follows.

# Optimizers - (GradientDescent), AdaGrad
class Optimizer(object):
    def __init__(self, params, learning_rate=0.01):
        self.lr = learning_rate
        self.params = params
       
    def minimize(self, loss):
        self.gradparams = [T.grad(loss, param) for param in params]
        
class AdagradOptimizer(Optimizer):
    def __init__(self, params, learning_rate=0.01, eps=1.e-6):
        super(AdagradOptimizer, self).__init__(params, learning_rate)
        self.eps = eps
        self.accugrads = [theano.shared(floatX(np.zeros(t.shape.eval())),
                          'accugrad') for t in self.params
                         ]
    def minimize(self, loss):
        super(AdagradOptimizer, self).minimize(loss)
        self.updates = OrderedDict()    # for Theano's rule

        for accugrad, param, gparam in zip(
                              self.accugrads, self.params, self.gradparams):
            agrad = accugrad + gparam * gparam
            dx = - (self.lr / T.sqrt(agrad + self.eps)) * gparam
            self.updates[param] = param + dx
            self.updates[accugrad] = agrad

        return self.updates

These are used to run the optimization process (learning).

    # Train
    myoptimizer = AdagradOptimizer(params, learning_rate=0.01, eps=1.e-8)
    one_update = myoptimizer.minimize(loss)
    
    # Compile ... define theano.function
    train_model = theano.function(
        inputs=[],
        outputs=[loss, accuracy],
        updates=one_update,
        givens=[(x, strain_x), (y_, strain_y)],
        allow_input_downcast=True
    )

    n_epochs = 10001
    epoch = 0
    
    while (epoch < n_epochs):
        epoch += 1
        loss, accu = train_model()
        if epoch % 1000 == 0:
            print('epoch[%5d] : cost =%8.4f, accyracy =%8.4f' % (epoch, loss, accu))

In this way, in the part of the optimization process, ** In TensorFlow ... ** After initializing the variables, run Session. Training data is given during the session in the form of op.run ({Feed_dict}). ** At Theano ... ** The flow of the learning process (including the supply of training data) is defined in theano.function (). Perform iterative learning calculations using the defined function (theano.function), The difference can be seen.

When I first started learning "Theano", I remember having trouble using this theano.function (), but by comparing it with "TensorFlow" as described above, I deepened my understanding of theano.function (). It was. (From a different point of view, if you have a good understanding of theano.function (), you will be able to use Theano well.)

Summary and impressions

Since it is a framework used for the same purpose, there are many similarities. In terms of functionality, TensorFlow is more complete, so it may be easier to port the Theno code for ThensorFlow. However, once the Optimizer and new functions are implemented, they can be reused, so Theano is not a disadvantage. (The code of the ancestor is also helpful, and there are many add-on libraries.)

I haven't tried more complex network models just by looking at the MLP code yet, but both seem to be very potential tools. (I also installed Chainer, but I couldn't get my hands on it ...)

(Addition) Information about Keras

When I checked the site about Neural Network Library "Keras" at the time of writing this article, it seems that in addition to the library used in combination with "Theano", the one used in combination with "TensorFlow" has been released. (I would like to find out later.)

Keras: Deep Learning library for Theano and TensorFlow

References (web site)

Recommended Posts

I tried porting the code written for TensorFlow to Theano
I tried tensorflow for the first time
I tried the MNIST tutorial for beginners of tensorflow.
I tried the TensorFlow tutorial 1st
I tried to find the average of the sequence with TensorFlow
I tried the TensorFlow tutorial 2nd
I tried to summarize the code often used in Pandas
mong --I tried porting the code that randomly generates Docker container names to Python -
I tried to move the ball
I tried to estimate the interval.
[For beginners] I tried using the Tensorflow Object Detection API
[For those who want to use TPU] I tried using the Tensorflow Object Detection API 2
I tried running the TensorFlow tutorial with comments (_TensorFlow_2_0_Introduction for beginners)
I tried to implement Autoencoder with TensorFlow
I tried to summarize the umask command
I tried to visualize AutoEncoder with TensorFlow
I tried to recognize the wake word
I tried to classify text using TensorFlow
I tried to summarize the graphical modeling.
I tried to estimate the pi stochastically
I tried to touch the COTOHA API
I wrote the code for Gibbs sampling
I tried to refer to the fun rock-paper-scissors poi for beginners with Python
I tried to get the authentication code of Qiita API with Python.
I tried web scraping to analyze the lyrics.
I tried using scrapy for the first time
[Python] I tried substituting the function name for the function name
vprof --I tried using the profiler for Python
I tried to optimize while drying the laundry
I tried to save the data with discord
I tried to touch the API of ebay
I tried python programming for the first time.
I tried to correct the keystone of the image
I tried to make AI for Smash Bros.
I tried Mind Meld for the first time
Qiita Job I tried to analyze the job offer
LeetCode I tried to summarize the simple ones
I tried to implement the traveling salesman problem
I tried to make a ○ ✕ game using TensorFlow
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
I tried to summarize the settings for various databases of Django (MySQL, PostgreSQL)
I tried to predict the change in snowfall for 2 years by machine learning
I tried to process and transform the image and expand the data for machine learning
[Introduction to AWS] I tried porting the conversation app and playing with text2speech @ AWS ♪
I tried to refactor the code of Python beginner (junior high school student)
I didn't understand the Resize of TensorFlow so I tried to summarize it visually.
I tried to automate the face hiding work of the coordination image for wear
I tried to debug.
I tried to paste
I tried running TensorFlow
I tried to learn the sin function with chainer
I tried to graph the packages installed in Python
I tried Python on Mac for the first time.
I tried to detect the iris from the camera image
I tried to summarize the basic form of GPLVM
I tried to touch the CSV file with Python
I tried to predict the J-League match (data analysis)
I tried to solve the soma cube with python
I tried python on heroku for the first time
I tried to approximate the sin function using chainer