Here is a summary of what I learned when doing tensor calculations etc. directly using the Keras backend.
When building a network with Keras, I think that most of the layers are defined in the layer, but for a special layer, it is necessary to create a function and plunge into the Lambda layer or Merge layer. For example, the following creates a model with a layer that returns the absolute value of the input.
lambda_layer_exp.py
from keras.models import Model
from keras.layers import Input, Lambda
import keras.backend as K
x_in = Input(shape=(3, 3))
x = Lambda(lambda x: K.abs(x))(x_in)
model = Model(input=x_in, output=x)
Let's put a value in this model.
>>> import numpy as np
>>> model.predict([np.array([[[-1,2,3],[4,-5,6],[7,8,-9]]])])
array([[[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.]]], dtype=float32)
If there are two or more inputs, the Merge layer will handle them. The example below adds a layer that takes the sum of the absolute values of two inputs. When using a function in the Merge layer, it is necessary to specify output_shape.
merge_layer_exp.py
from keras.models import Model
from keras.layers import Input, merge
import keras.backend as K
x_in1 = Input(shape=(3,))
x_in2 = Input(shape=(3,))
x = merge([x_in1, x_in2], mode=lambda x: K.abs(x[0]) + K.abs(x[1]), output_shape=(3,))
model = Model(input=[x_in1, x_in2], output=x)
When the calculation is executed with this model, it becomes as follows.
>>> import numpy as np
>>> model.predict([np.array([[-1,-2,3]]), np.array([[4,-5,-6]])])
array([[ 5., 7., 9.]], dtype=float32)
Many of the functions that exist in the back end are almost the same as those used in numpy, Tensorflow, Theano, etc., but there are some that are difficult to use, so I will focus on them.
dot, batch_dot
One thing to keep in mind when using the Keras backend and when not using it is that with the backend you have to consider the dimensions of the batch as you would with Tensorflow and so on.
The ones in Keras Layer that take an argument about shape are basically considered without the dimension of the batch. For RGB images, give it as shape = (3, 32, 32)
, but in the backend function, consider the batch dimension like shape = (None, 3, 32, 32)
. And you have to think about the calculation.
The dot product functions dot
and batch_dot
introduced here are functions that consider and do not consider the batch dimensions, respectively.
An example is shown below.
import keras.backend as K
a = K.variable([[1,2],[3,4]])
b = K.variable([[5,6],[7,8]])
print K.eval(K.dot(a, b)) #Multiplication of a matrix and b matrix
print K.eval(K.batch_dot(a, b, 1)) # a[i]And b[i]Dot product array of
print K.eval(a * b) #Multiplication by element
Doing this will result in the following:
[[ 19., 22.],
[ 43., 50.]]
[[ 17.],
[ 53.]]
[[ 5., 12.],
[ 21., 32.]]
You also need to explicitly give ʻoutput_shape if you want to push such a dimension-changing calculation into the
Lambda` layer.
import numpy as np
from keras.models import Model
from keras.layers import Input, Lambda
import keras.backend as K
x_in = Input(shape=(2, 2))
x = Lambda(lambda x: K.dot(K.variable([0, 1]), x), output_shape=(2,))(x_in)
model = Model(input=x_in, output=x)
print model.predict([np.array([[[1,2],[3,4]]])])
# [[ 3. 4.]]
one_hot As anyone using natural language processing or Tensorflow may know, as explained in Wikipedia, “1 Generates a bit string ❞ such that only one is High (1) and the other is Low (0). An example is shown below.
print K.eval(K.one_hot(K.variable([0,2,1,0], dtype=int), 3))
# [[ 1. 0. 0.]
# [ 0. 0. 1.]
# [ 0. 1. 0.]
# [ 1. 0. 0.]]
You can replace, add, and delete dimensions with permute_dimensions
, ʻexpand_dims,
squeeze`.
a = K.variable([[[1,2],[3,4]]])
print K.eval(K.shape(a))
# [1, 2, 2]
print K.eval(K.permute_dimensions(a, [1, 2, 0]))
# [[[ 1.],
# [ 2.]],
#
# [[ 3.],
# [ 4.]]]
print K.eval(K.expand_dims(a, 2))
# [[[[ 1., 2.]],
#
# [[ 3., 4.]]]]
print K.eval(K.squeeze(a, 0))
# [[ 1., 2.],
# [ 3., 4.]]
gather
It is a so-called slicing process, but since you can specify an index only for the first dimension, you need to combine it with permute_dimensions
etc. when you want to index for any axis.
a = K.variable([[1,2],[3,4],[5,6]])
print K.eval(K.gather(a, 0))
# [ 1., 2.]
print K.eval(K.gather(K.permute_dimensions(a, [1, 0]), 0)) # K.eval(K.gather(K.transpose(a), 0))Equivalent to
# [ 1., 3., 5.]
Finally, using the knowledge so far, I tried to reimplement the model written in chainer with keras. The story is Value Iteration Networks (@ peisuke's chainer implementation). It seems that it was the best paper of NIPS2016. I'm posting it on github.
Recommended Posts