Tensorflow 2.x I don't know anything

As a result of implementing the method with TF 2.x, he is a person who is on fire. It's been about a year since TF 2.x came out, but I will summarize what TF 2.x has done, what it hasn't done, and the methods that have recently come to light.

Can you do this? Isn't this way wrong? If there is something like this, please point it out and correct it. In the future, those who suffer from this ~~ shit ~~ framework will be saved.

What you can do with Tensorflow 2.x

PyTorch, Keras-like writing

With the availability of Eager Execution, tf.session has been deprecated and the interpreter can now execute tf functions. This made debugging easier and also easier to run.

Also, due to the flow of integration with keras, implementation using tf.keras.layer (nn.Module? In PyTorch) has been promoted. On the other hand, existing functional implementations are now aggregated under tf.nn, for example BatchNormalization is tf.nn.batch_normalization.

Use of TPU

It seems that the use of TPU is becoming more convenient. Articles around here is very easy to read, so please read it.

What you can't do with Tensorflow 2.x

~~BatchNormalization~~ You may think that Batch Normalization is not possible, but unexpectedly this layer has many holes and it is difficult to find a proper implementation (I wonder if there is no proper implementation yet ...)

What's particularly problematic is that it can't handle operations on multiple GPUs / TPUs **, so this one is PyTorch. #syncbatchnorm) and MxNet have been implemented, but TF 2.x has not been implemented yet. The time when I relied on horovod for MultiGPU / TPU (TF 1.x series) was realized from the support of that person, but strangely I tried to implement it inside TF It is postponed //github.com/tensorflow/community/blob/master/rfcs/20181016-replicator.md#global-batch-normalization).

This problem seems to be annoying throughout the TF 2.x family, as many researchers escape to PyTorch. ) It's a tough problem.

The latest issue currently open is probably here so please join us (I'm also digging).

--2020/2/3 The layer itself has been added. However, it is doubtful that it has been tested, so don't expect too much.

2020/03/17 SyncBatchNormalization Layer has been added as a feature of Tensorflow 2.2.0! !! !!

Multi GPU / TPU Arithmetic (WIP)

For the time being, TF 2.x can calculate with multiple GPUs / TPUs! That said, the reality is that it's full of Experimental support, as you can see in Official. Therefore, there is a high possibility that the rolling function will change, and it is difficult to expect backward compatibility.

Initialization of variables with initial batch data

Initialization with initial batch data, which is rarely used in deep learning, cannot be done as long as this is also observed. The way to do this is to rewrite the layer weights directly from the outside according to the initial batch, which requires designing an external function to initialize and a mechanism to call that external function during training. This makes it difficult to use the usual tf.keras.Model and tf.keras.Sequential (at least fit () cannot be used and must be trained in a custom training loop). (Example)

2020/03/17 I can't, so I made a way to do it myself. However, there is a bug that the model size cannot be calculated accurately because the variables related to initialization are forcibly incorporated in the training parameters.

class IdentityWithInit:
   def build(self, input_shape: tf.TensorShape):
        self.initialized = self.add_weight(
            name="initialized",
            dytpe=tf.bool,
            trainable=False 
        )
        self.initialized.assign(False)
        self.built = True
   def initialize_parameter(self, x: tf.Tensor):
        tf.print("initialized {}".format(self.name)) 
        pass

   def __init__(self):
       super()__init__()

   def call(self, x:tf.Tensor):
       if not self.initialized:
           self.initialize_parameter(x)
           self.initialized.assign(True)
       return x

Gradient calculation (2.0.0)

In 2.0.0, the gradient calculation seems to accident in a special case. (I thought it was sane, but it seems to be apt)

But 2.1.0 Has been fixed. (I also tested it at hand, but it was fixed.)

Use of slashes

It's rarely used when writing very long sentences in Python, but Tensorflow doesn't seem to use Python's parsing tree (?), So I can't use it. https://github.com/tensorflow/tensorflow/issues/35765

`Bad example`


variable * decay * \ 
 lr

`Good example`


(variable * decay * 
lr)

Using Python 3.8

Support for Python 3.8 seems to start in mid-2020/02. (It seems that you can try it by doing a source build)

https://github.com/tensorflow/tensorflow/issues/33374

How to use Tensorflow 2.x

Creating a Task class

I've noticed this recently so I don't know everything, but apparently the official [some implementations](https://github.com/tensorflow/models/blob/master/official/transformer/v2/ As far as transformer_main.py) is seen, it seems that it is recommended to create a class for training and inference separately from the model class.

Basically, the functions to be implemented are training, test train,test (eval)and xxx_step, and for xxx_step, it is a function that processes one batch and tf.function decorator. The pattern surrounded by is often seen.

class MyTask:
  def __init__(self, args):
     ...
     self.loss = tf.metrics.Mean(name='loss', dtype=tf.float32)

  def train(self):
    @tf.function
    def train_step(x: tf.Tensor, y: tf.Tensor):
      ...
      _y = self.model(x)
      loss = loss_fn(_y, y)
      self.loss(loss)

    for epoch in range(self.epochs):
      for x, y in tqdm(self.train_dataset):
        train_step(x, y)
      for x, y in self.val_dataset:
        val_step(x, y)
      print('EPOCH {} train: loss {} / val: loss {}'.format(epoch + 1, 
         self.loss.result(), self.val_loss.result()))
      self.loss.reset_states()
      self.val_loss.reset_states()    
  
  def test(self):
    @tf.function
    def test_step(x: tf.Tensor, y: tf.Tensor):
      ...

custom Layer ** kwargs

I'm writing a Tensorflow class What is this argument? I thought, but this is a more important argument than I expected, and it is very useful when switching inference / training in the BatchNormalization layer or Dropout layer, for example.

For example

class CustomLayer(Layer):
  def __init__(self):
    self.conv = Conv2D(...)
    self.bn = BatchNormalization(...)
    super().__init__()

  def build(self, input_shape):
    super().build(input_shape)

  def call(self, x: tf.Tensor, **kwargs):
    y = self.conv(x, **kwargs)
    y = self.bn(y, **kwargs)

cl = CustomLayer()

BatchNormalization can be executed in inference mode by setting cl (x, training = False).

[PYTHON] What you can and cannot do with Tensorflow 2.x