[PYTHON] Recommended tf.keras custom layer writing and variable name behavior

Introduction

I've found some undocumented behavior about name behavior in tf.keras custom layers, so let me know. The "variable name" mentioned here is not the variable name in the Python grammar, but the name (required as an argument) given to the Tensorflow variable (tf.Variable).

Before the recommended writing, I will explain a little about variable names.

Specific example of variable name

It's not my.v1 or self.v2 in the sample code below, but my_variable1 or my_variable2.

import tensorflow as tf

#Custom layer sample code
#Self-made fully connected layer
class MyLayer(tf.keras.layers.Layer):
    def __init__(self, output_dim):
        super().__init__()
        self.output_dim = output_dim

        #Bias term
        #It does not depend on the size of the input data
        self.v1 = self.add_weight(name='my_variable1', shape=[output_dim])

    def build(self, input_shape):
        #affine matrix
        #Depends on the size of the input data
        self.v2 = self.add_weight(name='my_variable2', shape=[input_shape[1], self.output_dim])
        self.built = True

    def call(self, inputs, **kwargs):
        return tf.matmul(inputs, self.v2) + self.v1

The contents around here are the contents in the official tutorial.

Is there something wrong?

Run for the time being

Let's actually run it and check it.

model = MyLayer(output_dim=3)
#The build method is executed the first time you enter data, so enter the appropriate data
x = tf.random.normal(shape=(3, 5))
y = model(x)

print(model.trainable_variables)
↓ This is the name
[<tf.Variable 'my_variable1:0' shape=(3,) dtype=float32, numpy=array([-0.56484747,  0.00200152,  0.42238712], dtype=float32)>, 
↓ This is the name
<tf.Variable 'my_layer/my_variable2:0' shape=(5, 3) dtype=float32, numpy=
array([[ 0.47857696, -0.04394728,  0.31904382],
       [ 0.37552172,  0.22522384,  0.07408607],
       [-0.74956644, -0.61549807, -0.41261673],
       [ 0.4850598 , -0.45188528,  0.56900233],
       [-0.39462167,  0.40858668, -0.5422235 ]], dtype=float32)>]

my_variable1: 0 and my_layer / my_variable2: 0. There's something extra, but I've confirmed that the variable names are my_variable1 and my_variable2, respectively, so it's OK.

Is it right?

When layers are stacked

Let's continue with the previous example.

#When you stack your own layers
model = tf.keras.Sequential([
    MyLayer(3),
    MyLayer(3),
    MyLayer(3)
])

[<tf.Variable 'my_variable1:0' shape=(3,) dtype=float32, (Abbreviation)>,
 <tf.Variable 'sequential/my_layer_1/my_variable2:0' shape=(5, 3) dtype=float32, (Abbreviation))>,
 <tf.Variable 'my_variable1:0' shape=(3,) dtype=float32, (Abbreviation)>,
 <tf.Variable 'sequential/my_layer_2/my_variable2:0' shape=(3, 3) dtype=float32, (Abbreviation)>,
 <tf.Variable 'my_variable1:0' shape=(3,) dtype=float32, (Abbreviation)>,
 <tf.Variable 'sequential/my_layer_3/my_variable2:0' shape=(3, 3) dtype=float32, (Abbreviation)]

my_variable1 is full (crying). Indistinguishable.

Even if I drew a histogram of variables with Tensorboard, the names collided and I could not understand the translation.

How to write a recommended custom layer

class MyLayer(tf.keras.layers.Layer):
    def __init__(self, output_dim):
        super().__init__()
        self.output_dim = output_dim
       
    def build(self, input_shape):
        #Bias term
        #It does not depend on the size of the input data
        self.v1 = self.add_weight(name='my_variable1', shape=[output_dim])

        #affine matrix
        #Depends on the size of the input data
        self.v2 = self.add_weight(name='my_variable2', shape=[input_shape[1], self.output_dim])
        self.built = True

    def call(self, inputs, **kwargs):
        return tf.matmul(inputs, self.v2) + self.v1

Simply declare all the variables in the build method.

Since Tensorflow is also define by run after version 2, I think that it can not be solved until the model and layer order is executed first. I think that's why the \ _ \ _ init__ and biuld methods make a big difference.

By the way, tf.keras.layers.Dense etc. are all declared in the build method, so you can use it with confidence.

Summary

When declaring a variable in a custom layer, be sure to declare it in the build method. Do not declare in the \ _ \ _ init__ method.

Digression

Explanation of the behavior of name processing

What is 0 at the end?

It will be added automatically according to the Tensorflow specifications. When executing on multiple GPUs, variables are copied for each GPU, so they are numbered 0, 1, 2, ... in that order. The specifications around here are the same as in version 1.

In version 2, you can check by doing the same as above on multiple GPUs using tf.distribute.MirroredStrategy etc.

What is the first my_layer?

my_layer is the default name if you did not explicitly name MyLayer. The class name is automatically converted to a snake case.

Also, when tf.keras.Sequential is used in the second example, it is my_layer_1, my_layer_2, my_layer_3. This is automatically added to the end to avoid name conflicts. This is because the first example has my_layer and the second example is executed in succession.

I think this is the same behavior as it was in version 1. At least Tensorflow's wrapper library dm-sonnet does the same.

Recommended Posts

Recommended tf.keras custom layer writing and variable name behavior
Decorator that checks variable names and changes behavior
Precautions when using tf.keras.layers.TimeDistributed for tf.keras custom layer