TensorFlow, a deep learning library developed by Google, allows you to build models and train training loops in a variety of ways. This is useful for experts, but can be a hindrance to understanding for beginners.
This time, we will comprehensively introduce the writing style recommended by TensorFlow 2.X, and explain how to use it while implementing VGG16 and ResNet50, which are well-known models in the field of image recognition.
--People who have tried the TensorFlow tutorial but cannot build a model on their own --People who read the source code written in TensorFlow and feel that there is an unfamiliar way of writing --Chainer, people who can write models with PyTorch but not with TensorFlow
First, let's take a look at TensorFlow's four model building APIs. After that, I will explain two training methods. Finally, we will implement VGG16 and ResNet50 using these writing styles.
>>> import sys
>>> sys.version
'3.7.7 (default, Mar 10 2020, 15:43:33) \n[Clang 11.0.0 (clang-1100.0.33.17)]'
pip list | grep tensorflow
tensorflow 2.2.0
tensorflow-estimator 2.2.0
TensorFlow provides two major APIs for building models, and four more subdivided APIs.
--Symbolic (declarative) API - Sequential API - Functional API --Primitive API (** 1. How to write X series. Not recommended **) --Instructive (model subclassing) API - Subclassing API
First, I will briefly introduce the major classifications.
It is a writing method that declares (≒ compiles) the shape of the model before executing learning.
Models written with this API cannot change shape during training. Therefore, some dynamically changing models (such as Tree-RNN) cannot be implemented. Instead, you can check the model shape before giving the data to the model.
Unlike the symbolic API, it is an imperative (≒ intuitive) writing style that does not declare.
It was first adopted by Chainer, a deep learning library that originated in Japan (Preferred Networks), and PyTorch also adopted it. You can implement the model as if you were writing a class in Python, which makes it easy to customize, such as changing layers and extending. Instead, the program cannot recognize what the model looks like until the data is given once.
Next, I will introduce a concrete writing method with a simple example.
Sequential API As the name implies, it is an API that implements a model by adding layers to Sequential. This style is often used in Keras and TensorFlow tutorials, so you may have seen it once.
As shown below, after instantiating an empty tensorflow.keras.Sequential
class, add layers with the ʻadd method, and give layers as a list to the arguments of the
tensorflow.keras.Sequential` class. It is common to instantiate.
import tensorflow as tf
from tensorflow.keras import layers
def sequential_vgg16_a(input_shape, output_size):
model = tf.keras.Sequential()
model.add(layers.Conv2D(64, 3, 1, padding="same", batch_input_shape=input_shape))
model.add(layers.BatchNormalization())
# ...(Omission)...
model.add(layers.Dense(output_size, activation="softmax"))
return model
def sequential_vgg16_b(input_shape, output_size):
model = tf.keras.Sequential([
layers.Conv2D(64, 3, 1, padding="same", batch_input_shape=input_shape),
layers.BatchNormalization(),
# ...(Omission)...
layers.Dense(output_size, activation="softmax")
]
return model
It only supports methods to add layers, so you can't write complex networks with multiple inputs, intermediate features, multiple outputs, or conditional branches. You can use this API to implement a simple network (like VGG) that just goes through the layers in sequence.
Functional API An API that implements complex models that cannot be described by the Sequential API.
First, instantiate tensorflow.keras.layers.Input
and pass it to the first layer.
After that, the data flow of the model is defined by passing the output of one layer to the next layer.
Finally, you can build your model by giving the resulting output and the first input as arguments to tensorflow.keras.Model
.
from tensorflow.keras import layers, Model
def functional_vgg16(input_shape, output_size, batch_norm=False):
inputs = layers.Input(batch_input_shape=input_shape)
x = layers.Conv2D(64, 3, 1, padding="same")(inputs)
if batch_norm:
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)
# ...(Omission)...
outputs = layers.Dense(output_size, activation="softmax")(x)
return Model(inputs=inputs, outputs=outputs)
In the above example, the presence or absence of the Batch Normalization layer is switched by the value of the variable batch_norm
.
If you need a flexible definition that changes the shape of the model depending on the conditions, you need the Functional API instead of the Sequential API.
Note that there's a seemingly weird way of writing parentheses followed by parentheses, but this isn't TensorFlow-specific and is commonly used in Python, and the following two represent the same thing:
#How to write 1
x = layers.BatchNormalization()(x)
#How to write 2
layer = layers.BatchNormalization()
x = layer(x)
Primitive API This API was mainly used in TensorFlow 1.X series. ** 2.X series is currently deprecated. ** **
The Sequential API and Functional API mentioned above could define the model by describing the flow of data passing through the model, but the Primitive API declaratively describes the entire processing flow including other computational processing. To do.
There is not much merit to learn this writing method from now on, so I will omit the explanation, but if you are training using tensorflow.Session
, it corresponds to this writing method.
import tensorflow as tf
sess = tf.Session()
Subclassing API An API that became available with the update to TensorFlow 2.X. It's written in much the same way as Chainer and PyTorch, and it's intuitive and easy to customize because you can implement your model as if you were writing a class in Python.
First, create a class by inheriting tensorflow.keras.Model
.
Then build the model by implementing the __init __
and call
methods.
The __init__
method in the class calls the __init__
method of the parent class and registers the layer you want to learn. ** Layer weights not listed here are not trained by default. ** **
The call
method in the class describes the forward propagation of layers. (Similar to Chainer's __call__
and PyTorch's forward
.)
from tensorflow.keras import layers, Model
class VGG16(Model):
def __init__(self, output_size=1000):
super().__init__()
self.layers_ = [
layers.Conv2D(64, 3, 1, padding="same"),
layers.BatchNormalization(),
# ...(Omission)...
layers.Dense(output_size, activation="softmax"),
]
def call(self, inputs):
for layer in self.layers_:
inputs = layer(inputs)
return inputs
It looks a bit verbose compared to other writing styles, but you can see that you can implement the model as you would normally write a class.
Note that the super
method that initializes the parent class also has a pattern that gives an argument, but this is written in consideration of the 2nd series Python, and in the 3rd series Python, the same processing is performed without an argument.
from tensorflow.keras import Model
#How to write Python 3 series
class VGG16_PY3(Model):
def __init__(self, output_size=1000):
super().__init__()
#How to write Python 2 series
class VGG16_PY2(Model):
def __init__(self, output_size=1000):
super().__init__(VGG16_PY2, self)
This concludes the explanation of how to build a model. In summary, I think you can use it properly as follows.
--If you want to easily write a model that only goes through layers unilaterally ** Sequential API ** --If you want to write a complicated model so that you can check the shape properly before executing training ** Functional API ** --If you want to write in Chainer or PyTorch style, or if you want to write a dynamic model ** Subclassing API **
There are two ways to train:
--built-in training --Custom training
This is a method of training using the built-in function of tensorflow.keras.Model
.
Many of you may know it because it is also used in the tutorials of Keras and TensorFlow. Scikit-learn also uses this method, although it is a different library.
First, instantiate the model implemented by the above API (tensorflow.keras.Model
, or an object that inherits it).
This instance has a compile
method and a fit
method as built-in functions.
Execute this compile
method to register the loss function, optimization function, and merit index.
Then train by executing the fit
method.
import tensorflow as tf
(train_images, train_labels), _ = tf.keras.datasets.cifar10.load_data()
#I'm using a trained model for illustration
model = tf.keras.applications.VGG16()
model.compile(
optimizer=tf.keras.optimizers.Adam(),
loss="sparse_categorical_crossentropy",
metrics=["accuracy"],
)
model.fit(train_images, train_labels)
You can now perform the training.
Batch size specification, epoch number, callback function registration, evaluation with validation data, etc. can be registered as keyword arguments of fit
method, so some customization is possible.
In many cases, this may be sufficient, but cases that do not fit into this frame (for example, cases where multiple models such as GAN are trained at the same time) need to be described in the custom training described later.
It doesn't have a special API, it's just a normal way to train in a Python for loop.
First, instantiate the model implemented by the above API (tensorflow.keras.Model
, or an object that inherits it).
Next, in addition to defining the loss function and optimization function, batch the dataset. After that, epoch and batch are rotated in a for loop.
In the for loop, first describe the forward propagation process in the tf.GradientTape
scope.
We then call the gradient
method to calculate the gradient and the ʻapply_gradients` method to update the weights according to the optimization function.
import tensorflow as tf
batch_size = 32
epochs = 10
(train_images, train_labels), _ = tf.keras.datasets.cifar10.load_data()
#I'm using a trained model for illustration
model = tf.keras.applications.VGG16()
buffer_size = len(train_images)
train_ds = tf.data.Dataset.from_tensor_slices((train_images, train_labels))
train_ds = train_ds.shuffle(buffer_size=buffer_size).batch(batch_size)
criterion = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.keras.optimizers.Adam()
for epoch in range(epochs):
for x, y_true in train_ds:
with tf.GradientTape() as tape:
y_pred = model(x, training=True)
loss = criterion(y_true=y_true, y_pred=y_pred)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
You can now perform the training.
In the above example, the validation data is not evaluated at all and the output to TensorBoard is not performed at all, but since the for loop is normally rotated, you can add processing as you like.
On the other hand, since the amount of description is inevitably large, it becomes a little difficult to guarantee the quality of the source code. Chainer and PyTorch can be written in almost the same way (although there are minor differences).
This concludes the explanation of the training method. In summary, I think you can use it properly as follows.
--If you do not need to perform any special processing during training, ** built-in training ** for general training methods --If you are not addicted to the built-in frame, if you want to add various processes during training and try and error, or if you want to write by hand ** Custom training **
I think there are some parts that cannot be understood from the explanation alone, so I will deepen my understanding through implementation.
Let's start with a brief introduction to the two models.
It is a high-performance model with a very simple structure that has 13 layers of 3x3 Convolution and 3 layers of fully connected layers. It is used to extract image features in various image recognition tasks. The original paper has over 37,000 citations and is very well known.
It can be implemented with the Sequential API, Functional API, and Subclassing API.
The original paper is here. https://arxiv.org/abs/1409.1556
This is a multi-layer model with a Residual mechanism (49 layers for Convolution and 1 layer for fully connected layers). As of 2020, this ResNet variant still has the highest image classification accuracy, and is also a high-performance model. Like the VGG16, it is used to extract image features in various image recognition tasks. The original paper has more than 45,000 citations (about 10 times that of BERT), which is also very famous.
It cannot be implemented by the Sequential API alone. It can be implemented with Functional API and Subclassing API.
The original paper is here. https://arxiv.org/abs/1512.03385
Let's implement it in each writing style.
VGG16 Sequential API
I don't have to think about it, so I write it normally.
from tensorflow.keras import layers, Sequential
def sequential_vgg16(input_shape, output_size):
params = {
"padding": "same",
"use_bias": True,
"kernel_initializer": "he_normal",
}
model = Sequential()
model.add(layers.Conv2D(64, 3, 1, **params, batch_input_shape=input_shape))
model.add(layers.BatchNormalization())
model.add(layers.ReLU())
model.add(layers.Conv2D(64, 3, 1, **params))
model.add(layers.BatchNormalization())
model.add(layers.ReLU())
model.add(layers.MaxPool2D(2, padding="same"))
model.add(layers.Conv2D(128, 3, 1, **params))
model.add(layers.BatchNormalization())
model.add(layers.ReLU())
model.add(layers.Conv2D(128, 3, 1, **params))
model.add(layers.BatchNormalization())
model.add(layers.ReLU())
model.add(layers.MaxPool2D(2, padding="same"))
model.add(layers.Conv2D(256, 3, 1, **params))
model.add(layers.BatchNormalization())
model.add(layers.ReLU())
model.add(layers.Conv2D(256, 3, 1, **params))
model.add(layers.BatchNormalization())
model.add(layers.ReLU())
model.add(layers.Conv2D(256, 3, 1, **params))
model.add(layers.BatchNormalization())
model.add(layers.ReLU())
model.add(layers.MaxPool2D(2, padding="same"))
model.add(layers.Conv2D(512, 3, 1, **params))
model.add(layers.BatchNormalization())
model.add(layers.ReLU())
model.add(layers.Conv2D(512, 3, 1, **params))
model.add(layers.BatchNormalization())
model.add(layers.ReLU())
model.add(layers.Conv2D(512, 3, 1, **params))
model.add(layers.BatchNormalization())
model.add(layers.ReLU())
model.add(layers.MaxPool2D(2, padding="same"))
model.add(layers.Conv2D(512, 3, 1, **params))
model.add(layers.BatchNormalization())
model.add(layers.ReLU())
model.add(layers.Conv2D(512, 3, 1, **params))
model.add(layers.BatchNormalization())
model.add(layers.ReLU())
model.add(layers.Conv2D(512, 3, 1, **params))
model.add(layers.BatchNormalization())
model.add(layers.ReLU())
model.add(layers.MaxPool2D(2, padding="same"))
model.add(layers.Flatten())
model.add(layers.Dense(4096))
model.add(layers.Dense(4096))
model.add(layers.Dense(output_size, activation="softmax"))
return model
It's pretty simple to write, but it turns out to be hard to see because of the many layers.
For example, you may not notice if ReLU
is missing somewhere.
Also, for example, if you want to eliminate Batch Normalization
, you need to comment out line by line, which is poorly reusable and customizable.
VGG16 Functional API It is more flexible to write than the Sequential API. This time, let's make the reused layer group (Convolution --Batch Normalization --ReLU) a function.
from tensorflow.keras import layers, Model
def functional_cbr(x, filters, kernel_size, strides):
params = {
"filters": filters,
"kernel_size": kernel_size,
"strides": strides,
"padding": "same",
"use_bias": True,
"kernel_initializer": "he_normal",
}
x = layers.Conv2D(**params)(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)
return x
def functional_vgg16(input_shape, output_size):
inputs = layers.Input(batch_input_shape=input_shape)
x = functional_cbr(inputs, 64, 3, 1)
x = functional_cbr(x, 64, 3, 1)
x = layers.MaxPool2D(2, padding="same")(x)
x = functional_cbr(x, 128, 3, 1)
x = functional_cbr(x, 128, 3, 1)
x = layers.MaxPool2D(2, padding="same").__call__(x) #You can write like this
x = functional_cbr(x, 256, 3, 1)
x = functional_cbr(x, 256, 3, 1)
x = functional_cbr(x, 256, 3, 1)
x = layers.MaxPool2D(2, padding="same").call(x) #You can write like this
x = functional_cbr(x, 512, 3, 1)
x = functional_cbr(x, 512, 3, 1)
x = functional_cbr(x, 512, 3, 1)
x = layers.MaxPool2D(2, padding="same")(x)
x = functional_cbr(x, 512, 3, 1)
x = functional_cbr(x, 512, 3, 1)
x = functional_cbr(x, 512, 3, 1)
x = layers.MaxPool2D(2, padding="same")(x)
x = layers.Flatten()(x)
x = layers.Dense(4096)(x)
x = layers.Dense(4096)(x)
outputs = layers.Dense(output_size, activation="softmax")(x)
return Model(inputs=inputs, outputs=outputs)
I was able to write quite clearly.
If you want to get rid of BatchNormalization
or change ReLU
to LeaklyReLU
, you only need to fix a few lines.
VGG16 Subclassing API Let's write a group of layers (Convolution --BatchNormalization --ReLU) that is reused as a class like the Functional API.
from tensorflow.keras import layers, Model
class CBR(Model):
def __init__(self, filters, kernel_size, strides):
super().__init__()
params = {
"filters": filters,
"kernel_size": kernel_size,
"strides": strides,
"padding": "same",
"use_bias": True,
"kernel_initializer": "he_normal",
}
self.layers_ = [
layers.Conv2D(**params),
layers.BatchNormalization(),
layers.ReLU()
]
def call(self, inputs):
for layer in self.layers_:
inputs = layer(inputs)
return inputs
class VGG16(Model):
def __init__(self, output_size=1000):
super().__init__()
self.layers_ = [
CBR(64, 3, 1),
CBR(64, 3, 1),
layers.MaxPool2D(2, padding="same"),
CBR(128, 3, 1),
CBR(128, 3, 1),
layers.MaxPool2D(2, padding="same"),
CBR(256, 3, 1),
CBR(256, 3, 1),
CBR(256, 3, 1),
layers.MaxPool2D(2, padding="same"),
CBR(512, 3, 1),
CBR(512, 3, 1),
CBR(512, 3, 1),
layers.MaxPool2D(2, padding="same"),
CBR(512, 3, 1),
CBR(512, 3, 1),
CBR(512, 3, 1),
layers.MaxPool2D(2, padding="same"),
layers.Flatten(),
layers.Dense(4096),
layers.Dense(4096),
layers.Dense(output_size, activation="softmax"),
]
def call(self, inputs):
for layer in self.layers_:
inputs = layer(inputs)
return inputs
It's easier to understand intuitively than the Functional API because __init__
is responsible for defining the model and call
is responsible for calling the model, but the code is longer.
Another point is that the imperative Subclassing API does not require an input shape (no ʻinput_shape` is required as an argument) when generating a model.
I intended to make it as easy to compare as possible, but how was it?
This implementation uses Batch Normalization between the Convolution layers and He initialization for weight initialization, but these techniques have not yet been published when the original paper was submitted. So there was no Batch Normalization layer and Grolot initialization was used to initialize the weights. Therefore, in the original paper, a transfer learning-like learning method is adopted in which a 7-layer model is trained and then layers are gradually added in order to avoid gradient disappearance.
It would be interesting to try out what would happen if you removed the Batch Normalization layer, what would happen if you changed the weight initialization method, etc. to better understand the above implementation.
Then implement ResNet50. Since the Sequential API cannot be written by itself, write it with the Functional API and Subclassing API.
ResNet50 Functional API Functionalize and implement the reused Residual mechanism.
from tensorflow.keras import layers, Model
def functional_bottleneck_residual(x, in_ch, out_ch, strides=1):
params = {
"padding": "same",
"kernel_initializer": "he_normal",
"use_bias": True,
}
inter_ch = out_ch // 4
h1 = layers.Conv2D(inter_ch, kernel_size=1, strides=strides, **params)(x)
h1 = layers.BatchNormalization()(h1)
h1 = layers.ReLU()(h1)
h1 = layers.Conv2D(inter_ch, kernel_size=3, strides=1, **params)(h1)
h1 = layers.BatchNormalization()(h1)
h1 = layers.ReLU()(h1)
h1 = layers.Conv2D(out_ch, kernel_size=1, strides=1, **params)(h1)
h1 = layers.BatchNormalization()(h1)
if in_ch != out_ch:
h2 = layers.Conv2D(out_ch, kernel_size=1, strides=strides, **params)(x)
h2 = layers.BatchNormalization()(h2)
else:
h2 = x
h = layers.Add()([h1, h2])
h = layers.ReLU()(h)
return h
def functional_resnet50(input_shape, output_size):
inputs = layers.Input(batch_input_shape=input_shape)
x = layers.Conv2D(64, 7, 2, padding="same", kernel_initializer="he_normal")(inputs)
x = layers.BatchNormalization()(x)
x = layers.MaxPool2D(pool_size=3, strides=2, padding="same")(x)
x = functional_bottleneck_residual(x, 64, 256)
x = functional_bottleneck_residual(x, 256, 256)
x = functional_bottleneck_residual(x, 256, 256)
x = functional_bottleneck_residual(x, 256, 512, 2)
x = functional_bottleneck_residual(x, 512, 512)
x = functional_bottleneck_residual(x, 512, 512)
x = functional_bottleneck_residual(x, 512, 512)
x = functional_bottleneck_residual(x, 512, 1024, 2)
x = functional_bottleneck_residual(x, 1024, 1024)
x = functional_bottleneck_residual(x, 1024, 1024)
x = functional_bottleneck_residual(x, 1024, 1024)
x = functional_bottleneck_residual(x, 1024, 1024)
x = functional_bottleneck_residual(x, 1024, 1024)
x = functional_bottleneck_residual(x, 1024, 2048, 2)
x = functional_bottleneck_residual(x, 2048, 2048)
x = functional_bottleneck_residual(x, 2048, 2048)
x = layers.GlobalAveragePooling2D()(x)
outputs = layers.Dense(
output_size, activation="softmax", kernel_initializer="he_normal"
)(x)
return Model(inputs=inputs, outputs=outputs)
Within the functional_bottleneck_residual
method, h1
, h2
, and h
appear.
In this way, a model in which the data flow branches in the middle cannot be described by the Sequential API.
In addition, h2
does nothing if the number of input / output channels is the same, and performs a process (Projection) to adjust the number of channels if they are different. Such conditional branching cannot be described by the Sequential API.
Once you've created this method, all you have to do is write it in sequence.
ResNet50 Subclassing API
Classify and implement a Residual mechanism that is reused like the Functional API.
from tensorflow import layers, Model
class BottleneckResidual(Model):
"""ResNet's Bottleneck Residual Module.
By reducing the ch dimension with 1x1 conv on the first layer
Reduced the amount of calculation of 3x3 conv in the second layer
Restore the dimensions of the ch output with 1x1 conv on the third layer.
It is called bottleneck because it reduces the dimension of the second layer 3x3 conv, which requires a lot of calculation..
"""
def __init__(self, in_ch, out_ch, strides=1):
super().__init__()
self.projection = in_ch != out_ch
inter_ch = out_ch // 4
params = {
"padding": "same",
"kernel_initializer": "he_normal",
"use_bias": True,
}
self.common_layers = [
layers.Conv2D(inter_ch, kernel_size=1, strides=strides, **params),
layers.BatchNormalization(),
layers.ReLU(),
layers.Conv2D(inter_ch, kernel_size=3, strides=1, **params),
layers.BatchNormalization(),
layers.ReLU(),
layers.Conv2D(out_ch, kernel_size=1, strides=1, **params),
layers.BatchNormalization(),
]
if self.projection:
self.projection_layers = [
layers.Conv2D(out_ch, kernel_size=1, strides=strides, **params),
layers.BatchNormalization(),
]
self.concat_layers = [layers.Add(), layers.ReLU()]
def call(self, inputs):
h1 = inputs
h2 = inputs
for layer in self.common_layers:
h1 = layer(h1)
if self.projection:
for layer in self.projection_layers:
h2 = layer(h2)
outputs = [h1, h2]
for layer in self.concat_layers:
outputs = layer(outputs)
return outputs
class ResNet50(Model):
"""ResNet50.
The element is
conv * 1
resblock(conv * 3) * 3
resblock(conv * 3) * 4
resblock(conv * 3) * 6
resblock(conv * 3) * 3
dense * 1
Consists of, conv * 49 + dense *50 layers of 1.
"""
def __init__(self, output_size=1000):
super().__init__()
self.layers_ = [
layers.Conv2D(64, 7, 2, padding="same", kernel_initializer="he_normal"),
layers.BatchNormalization(),
layers.MaxPool2D(pool_size=3, strides=2, padding="same"),
BottleneckResidual(64, 256),
BottleneckResidual(256, 256),
BottleneckResidual(256, 256),
BottleneckResidual(256, 512, 2),
BottleneckResidual(512, 512),
BottleneckResidual(512, 512),
BottleneckResidual(512, 512),
BottleneckResidual(512, 1024, 2),
BottleneckResidual(1024, 1024),
BottleneckResidual(1024, 1024),
BottleneckResidual(1024, 1024),
BottleneckResidual(1024, 1024),
BottleneckResidual(1024, 1024),
BottleneckResidual(1024, 2048, 2),
BottleneckResidual(2048, 2048),
BottleneckResidual(2048, 2048),
layers.GlobalAveragePooling2D(),
layers.Dense(
output_size, activation="softmax", kernel_initializer="he_normal"
),
]
def call(self, inputs):
for layer in self.layers_:
inputs = layer(inputs)
return inputs
It's not that much different from the Functional API.
The __init__
layer is written to put the layers together in a list, but you can write this area freely as long as it is registered in the variable of the class.
We introduced ResNet50 as a model that cannot be implemented by the Sequential API alone. To be honest, there is no big difference, so I think it's okay to use the Functional API and Subclassing API according to your preference.
Finally, let's compare the training loop implementations.
Since it would be quite long to put all the source code, the method is partially cut out to src.utils
.
It's not that complicated, so it would be helpful if you could read it while complementing it.
For the time being, all the sources are in the following repositories, so please have a look if you are interested. https://github.com/Anieca/deep-learning-models
Let's specify some options such as test data accuracy calculation and log output for TensorBoard.
import os
import tensorflow as tf
from src.utils import load_dataset, load_model, get_args, get_current_time
def builtin_train(args):
# 1. load dataset and model
(train_images, train_labels), (test_images, test_labels) = load_dataset(args.data)
input_shape = train_images[: args.batch_size, :, :, :].shape
output_size = max(train_labels) + 1
model = load_model(args.arch, input_shape=input_shape, output_size=output_size)
model.summary()
# 2. set tensorboard cofigs
logdir = os.path.join(args.logdir, get_current_time())
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=logdir)
# 3. loss, optimizer, metrics setting
model.compile(
optimizer=tf.keras.optimizers.Adam(),
loss="sparse_categorical_crossentropy",
metrics=["accuracy"],
)
# 4. dataset config (and validation, callback config)
fit_params = {}
fit_params["batch_size"] = args.batch_size
fit_params["epochs"] = args.max_epoch
if args.steps_per_epoch:
fit_params["steps_per_epoch"] = args.steps_per_epoch
fit_params["verbose"] = 1
fit_params["callbacks"] = [tensorboard_callback]
fit_params["validation_data"] = (test_images, test_labels)
# 5. start train and test
model.fit(train_images, train_labels, **fit_params)
It's pretty simple to write.
There are many other callback functions, so if you are interested, please read the documentation. https://www.tensorflow.org/api_docs/python/tf/keras/callbacks
Let's implement the same process as the built-in training above by ourselves.
import os
import tensorflow as tf
from src.utils import load_dataset, load_model, get_args, get_current_time
def custom_train(args):
# 1. load dataset and model
(train_images, train_labels), (test_images, test_labels) = load_dataset(args.data)
input_shape = train_images[: args.batch_size, :, :, :].shape
output_size = max(train_labels) + 1
model = load_model(args.arch, input_shape=input_shape, output_size=output_size)
model.summary()
# 2. set tensorboard configs
logdir = os.path.join(args.logdir, get_current_time())
train_writer = tf.summary.create_file_writer(os.path.join(logdir, "train"))
test_writer = tf.summary.create_file_writer(os.path.join(logdir, "test"))
# 3. loss, optimizer, metrics setting
criterion = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.keras.optimizers.Adam()
train_loss_avg = tf.keras.metrics.Mean()
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy()
test_loss_avg = tf.keras.metrics.Mean()
test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy()
# 4. dataset config
buffer_size = len(train_images)
train_ds = tf.data.Dataset.from_tensor_slices((train_images, train_labels))
train_ds = train_ds.shuffle(buffer_size=buffer_size).batch(args.batch_size)
test_ds = tf.data.Dataset.from_tensor_slices((test_images, test_labels))
test_ds = test_ds.batch(args.batch_size)
# 5. start train and test
for epoch in range(args.max_epoch):
# 5.1. initialize metrics
train_loss_avg.reset_states()
train_accuracy.reset_states()
test_loss_avg.reset_states()
test_loss_avg.reset_states()
# 5.2. initialize progress bar
train_pbar = tf.keras.utils.Progbar(args.steps_per_epoch)
test_pbar = tf.keras.utils.Progbar(args.steps_per_epoch)
# 5.3. start train
for i, (x, y_true) in enumerate(train_ds):
if args.steps_per_epoch and i >= args.steps_per_epoch:
break
# 5.3.1. forward
with tf.GradientTape() as tape:
y_pred = model(x, training=True)
loss = criterion(y_true=y_true, y_pred=y_pred)
# 5.3.2. calculate gradients from `tape` and backward
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
# 5.3.3. update metrics and progress bar
train_loss_avg(loss)
train_accuracy(y_true, y_pred)
train_pbar.update(
i + 1,
[
("avg_loss", train_loss_avg.result()),
("accuracy", train_accuracy.result()),
],
)
# 5.4. start test
for i, (x, y_true) in enumerate(test_ds):
if args.steps_per_epoch and i >= args.steps_per_epoch:
break
# 5.4.1. forward
y_pred = model(x)
loss = criterion(y_true, y_pred)
# 5.4.2. update metrics and progress bar
test_loss_avg(loss)
test_accuracy(y_true, y_pred)
test_pbar.update(
i + 1,
[
("avg_test_loss", test_loss_avg.result()),
("test_accuracy", test_accuracy.result()),
],
)
# 5.5. write metrics to tensorboard
with train_writer.as_default():
tf.summary.scalar("Loss", train_loss_avg.result(), step=epoch)
tf.summary.scalar("Acc", train_accuracy.result(), step=epoch)
with test_writer.as_default():
tf.summary.scalar("Loss", test_loss_avg.result(), step=epoch)
tf.summary.scalar("Acc", test_accuracy.result(), step=epoch)
It doesn't change that much until the start of training, but the amount of description in the training loop (comment 5.) is quite large.
Managing utilities such as managing TensorBoard output and creating progress bars yourself can be costly, but built-in is pretty easy to use.
If you want to write a process that is not provided in built-in, you need to write it in custom training, but if not, it seems better to use built-in.
that's all. Thank you for your hard work.
I introduced various writing methods of TensorFlow 2 system with implementation.
I intend to write flatly without giving too much superiority or inferiority to each writing style.
When writing by yourself, I think that you should write according to the situation and taste, but when you are looking for the source code, you will come across various writing styles, so I think it is good to understand all the writing styles somehow.
We hope you find it useful.
Recommended Posts