As a result of implementing the method with TF 2.x, he is a person who is on fire. It's been about a year since TF 2.x came out, but I will summarize what TF 2.x has done, what it hasn't done, and the methods that have recently come to light.
Can you do this? Isn't this way wrong? If there is something like this, please point it out and correct it. In the future, those who suffer from this ~~ shit ~~ framework will be saved.
With the availability of Eager Execution, tf.session
has been deprecated and the interpreter can now execute tf functions. This made debugging easier and also easier to run.
Also, due to the flow of integration with keras, implementation using tf.keras.layer
(nn.Module
? In PyTorch) has been promoted. On the other hand, existing functional implementations are now aggregated under tf.nn
, for example BatchNormalization is tf.nn.batch_normalization
.
It seems that the use of TPU is becoming more convenient. Articles around here is very easy to read, so please read it.
~~BatchNormalization~~ You may think that Batch Normalization is not possible, but unexpectedly this layer has many holes and it is difficult to find a proper implementation (I wonder if there is no proper implementation yet ...)
What's particularly problematic is that it can't handle operations on multiple GPUs / TPUs **, so this one is PyTorch. #syncbatchnorm) and MxNet have been implemented, but TF 2.x has not been implemented yet. The time when I relied on horovod for MultiGPU / TPU (TF 1.x series) was realized from the support of that person, but strangely I tried to implement it inside TF It is postponed //github.com/tensorflow/community/blob/master/rfcs/20181016-replicator.md#global-batch-normalization).
This problem seems to be annoying throughout the TF 2.x family, as many researchers escape to PyTorch. ) It's a tough problem.
The latest issue currently open is probably here so please join us (I'm also digging).
--2020/2/3 The layer itself has been added. However, it is doubtful that it has been tested, so don't expect too much.
For the time being, TF 2.x can calculate with multiple GPUs / TPUs! That said, the reality is that it's full of Experimental support, as you can see in Official. Therefore, there is a high possibility that the rolling function will change, and it is difficult to expect backward compatibility.
Initialization with initial batch data, which is rarely used in deep learning, cannot be done as long as this is also observed. The way to do this is to rewrite the layer weights directly from the outside according to the initial batch, which requires designing an external function to initialize and a mechanism to call that external function during training. This makes it difficult to use the usual tf.keras.Model
and tf.keras.Sequential
(at least fit ()
cannot be used and must be trained in a custom training loop). (Example)
class IdentityWithInit:
def build(self, input_shape: tf.TensorShape):
self.initialized = self.add_weight(
name="initialized",
dytpe=tf.bool,
trainable=False
)
self.initialized.assign(False)
self.built = True
def initialize_parameter(self, x: tf.Tensor):
tf.print("initialized {}".format(self.name))
pass
def __init__(self):
super()__init__()
def call(self, x:tf.Tensor):
if not self.initialized:
self.initialize_parameter(x)
self.initialized.assign(True)
return x
In 2.0.0, the gradient calculation seems to accident in a special case. (I thought it was sane, but it seems to be apt)
But 2.1.0 Has been fixed. (I also tested it at hand, but it was fixed.)
It's rarely used when writing very long sentences in Python, but Tensorflow doesn't seem to use Python's parsing tree (?), So I can't use it. https://github.com/tensorflow/tensorflow/issues/35765
Bad example
variable * decay * \
lr
Good example
(variable * decay *
lr)
Support for Python 3.8 seems to start in mid-2020/02. (It seems that you can try it by doing a source build)
https://github.com/tensorflow/tensorflow/issues/33374
I've noticed this recently so I don't know everything, but apparently the official [some implementations](https://github.com/tensorflow/models/blob/master/official/transformer/v2/ As far as transformer_main.py) is seen, it seems that it is recommended to create a class for training and inference separately from the model class.
Basically, the functions to be implemented are training, test train
,test (eval)
and xxx_step
, and for xxx_step
, it is a function that processes one batch and tf.function
decorator. The pattern surrounded by is often seen.
class MyTask:
def __init__(self, args):
...
self.loss = tf.metrics.Mean(name='loss', dtype=tf.float32)
def train(self):
@tf.function
def train_step(x: tf.Tensor, y: tf.Tensor):
...
_y = self.model(x)
loss = loss_fn(_y, y)
self.loss(loss)
for epoch in range(self.epochs):
for x, y in tqdm(self.train_dataset):
train_step(x, y)
for x, y in self.val_dataset:
val_step(x, y)
print('EPOCH {} train: loss {} / val: loss {}'.format(epoch + 1,
self.loss.result(), self.val_loss.result()))
self.loss.reset_states()
self.val_loss.reset_states()
def test(self):
@tf.function
def test_step(x: tf.Tensor, y: tf.Tensor):
...
I'm writing a Tensorflow class What is this argument? I thought, but this is a more important argument than I expected, and it is very useful when switching inference / training in the BatchNormalization layer or Dropout layer, for example.
For example
class CustomLayer(Layer):
def __init__(self):
self.conv = Conv2D(...)
self.bn = BatchNormalization(...)
super().__init__()
def build(self, input_shape):
super().build(input_shape)
def call(self, x: tf.Tensor, **kwargs):
y = self.conv(x, **kwargs)
y = self.bn(y, **kwargs)
cl = CustomLayer()
BatchNormalization can be executed in inference mode by setting cl (x, training = False)
.
Recommended Posts