[PYTHON] TensorFlow tips-Ported Lasagne's NIN Layer

Introduction

(I'm sorry for the very niche content.)

When I was porting a certain GAN (Generative Adversarial Networks) code written in ** Theano ** + ** Lasagne ** to ** TensorFlow **, I came across something called "lasagne.layers.NINLayer". It was. If you look at the Lasagne documentation,

Network-in-network layer. Like DenseLayer, but broadcasting across all trailing dimensions beyond the 2nd. This results in a convolution operation with filter size 1 on all trailing dimensions. Any number of trailing dimensions is supported, so NINLayer can be used to implement 1D, 2D, 3D, ... convolutions.

"A layer like a tightly coupled layer that" broadcasts "to orders beyond the second dimension." From the name of Network-in-network layer, I got ready at first, but I would like to introduce it because it was very easy to write in TensorFlow.

(Programming environment: - Python: 3.5.2 - Theano: 0.9.0 - Lasagne: 0.2.dev1 - TensorFlow: 1.2.0 ）

Theano + Lasagne movement

Since the operation was not clear in the explanation of the document, I tested it with small data.

import numpy as np
import theano
import theano.tensor as T

import lasagne
from lasagne.layers import InputLayer, NINLayer

# variables
x = T.tensor4('x')
y = T.matrix('y')

# shared variable
w_np= np.array([[1., 2., 3], [2., 2., 2.]], dtype=np.float32)
ws = theano.shared(w_np, name='w')

# layers
l_in = InputLayer((1, 2, 5, 5))  # image size = 3 
l1 = NINLayer(l_in, num_units=3, W=ws, b=None,
              nonlinearity=None)
l_in.input_var = x
y = lasagne.layers.get_output(l1)

# theano function
mylayer = theano.function(
    inputs=[x],
    outputs=y,
    allow_input_downcast=True
)

x_np = np.ones([1, 2, 5, 5]) * 0.1
y_np = mylayer(x_np)
y_np = np.asarray(y_np)

print('x_np.shape = ', x_np.shape)
print('x_np = \n', x_np)
print(' ')
print('y_np.shape = ', y_np.shape)
print('y_np = \n', y_np)

The following is the output of the calculation result.

x_np.shape =  (1, 2, 5, 5)
x_np = 
 [[[[ 0.1  0.1  0.1  0.1  0.1]
   [ 0.1  0.1  0.1  0.1  0.1]
   [ 0.1  0.1  0.1  0.1  0.1]
   [ 0.1  0.1  0.1  0.1  0.1]
   [ 0.1  0.1  0.1  0.1  0.1]]

  [[ 0.1  0.1  0.1  0.1  0.1]
   [ 0.1  0.1  0.1  0.1  0.1]
   [ 0.1  0.1  0.1  0.1  0.1]
   [ 0.1  0.1  0.1  0.1  0.1]
   [ 0.1  0.1  0.1  0.1  0.1]]]]
 
y_np.shape =  (1, 3, 5, 5)
y_np = 
 [[[[ 0.30000001  0.30000001  0.30000001  0.30000001  0.30000001]
   [ 0.30000001  0.30000001  0.30000001  0.30000001  0.30000001]
   [ 0.30000001  0.30000001  0.30000001  0.30000001  0.30000001]
   [ 0.30000001  0.30000001  0.30000001  0.30000001  0.30000001]
   [ 0.30000001  0.30000001  0.30000001  0.30000001  0.30000001]]

  [[ 0.40000001  0.40000001  0.40000001  0.40000001  0.40000001]
   [ 0.40000001  0.40000001  0.40000001  0.40000001  0.40000001]
   [ 0.40000001  0.40000001  0.40000001  0.40000001  0.40000001]
   [ 0.40000001  0.40000001  0.40000001  0.40000001  0.40000001]
   [ 0.40000001  0.40000001  0.40000001  0.40000001  0.40000001]]

  [[ 0.5         0.5         0.5         0.5         0.5       ]
   [ 0.5         0.5         0.5         0.5         0.5       ]
   [ 0.5         0.5         0.5         0.5         0.5       ]
   [ 0.5         0.5         0.5         0.5         0.5       ]
   [ 0.5         0.5         0.5         0.5         0.5       ]]]]

If you look at the above input / output data, you can understand the function of the layer. Input each pixel data expanded in two dimensions and weight in the channel direction (depth-wise). In the above example, w = [[1., 2., 3], [2., 2., 2] It is a calculation to multiply by.]]. Since the input is in the form of 2 channels and the weight is in the form of [2 x 3], the output is 3 channels. Also, in the above code, the input data is all ... 0.1, but since it is actually image data, any value can be entered.

Code ported to TensorFlow

Note that in "Theano", the image data is treated as "channel-1st" (actually, the channel comes second for the convenience of batch processing), and in "TensorFlow" it is "channel-last". Then, I transplanted it.

# NIN (network in network) like function
def lasagne_nin_like(x, w):
    '''
      args.:
        input shape:    [None, pixel, pixel, input_channel]
        output shape:   [None, pixel, pixel, output_channel]
    '''
    input_ch = tf.shape(x)[-1]      # eq. 2
    output_ch = tf.shape(w)[-1]     # eq. 3
    net = tf.reshape(x, [-1, input_ch])
    net = tf.matmul(net, w)
    y = tf.reshape(net, [-1, 5, 5, output_ch])

    return y

y = lasagne_nin_like(x, W)

All you have to do is to tf.reshape () the input value to a flat shape, leaving the number of channels, and then restore the original shape after matrix multiplication. It was easier than expected to transplant.

The entire TensorFlow code is as follows.

import numpy as np
import tensorflow as tf


# tensorflow placeholders
x = tf.placeholder(tf.float32, [None, 5, 5, 2])


# shared variable
w_np= np.array([[1., 2., 3], [2., 2., 2.]])
W = tf.Variable(w_np, dtype=tf.float32)

# NIN (network in network) like function
def lasagne_nin_like(x, w):
    '''
      args.:
        input shape:    [None, pixel, pixel, input_channel]
        output shape:   [None, pixel, pixel, output_channel]
    '''
    input_ch = tf.shape(x)[-1]      # eq. 2
    output_ch = tf.shape(w)[-1]     # eq. 3
    net = tf.reshape(x, [-1, input_ch])
    net = tf.matmul(net, w)
    y = tf.reshape(net, [-1, 5, 5, output_ch])

    return y

y = lasagne_nin_like(x, W)

# tensorflow session
init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
sess.run(init)


x_np = np.ones([1, 5, 5, 2], dtype=np.float32) * 0.1
y_np = sess.run(y, feed_dict={x: x_np})

print('x_np.shape = ', x_np.shape)

# output should be transposed to compare theano's result.
#Transpose of 4D tensor for comparison of calculation results(ch-last -> ch-1st)I do
y_np = np.transpose(y_np, (0, 3, 1, 2))

print('y_np.shape = ', y_np.shape)
print('y_np = ', y_np)

Impressions

The situation is as it is, "it is easier to produce than you think." As a consideration, the reason why it was easy to port is thought to be that TensorFlow follows the "channel-last" rule. On the contrary, in Theano + Lasagne, there was an inconvenience that data could not be flattened in one line (on a channel basis), so lasagne.layers.NINLayer may have been prepared.

(The title "Porting NIN Layer" was a pretty exaggerated title ...)

Reference Web Site

Lasagne documentation
http://lasagne.readthedocs.io/en/latest/index.html
TensorFlow documentation
https://www.tensorflow.org/