[PYTHON] Explain how to use TensorFlow 2.X with implementation of VGG16 / ResNet50

Introduction

TensorFlow, a deep learning library developed by Google, allows you to build models and train training loops in a variety of ways. This is useful for experts, but can be a hindrance to understanding for beginners.

This time, we will comprehensively introduce the writing style recommended by TensorFlow 2.X, and explain how to use it while implementing VGG16 and ResNet50, which are well-known models in the field of image recognition.

TensorFlow 2.X refers to TensorFlow with a major version of 2 or higher.

Target audience

--People who have tried the TensorFlow tutorial but cannot build a model on their own --People who read the source code written in TensorFlow and feel that there is an unfamiliar way of writing --Chainer, people who can write models with PyTorch but not with TensorFlow

flow

First, let's take a look at TensorFlow's four model building APIs. After that, I will explain two training methods. Finally, we will implement VGG16 and ResNet50 using these writing styles.

Verification environment

macOS Catalina 10.15.3
Python 3.7.7
tensorflow 2.2.0

>>> import sys
>>> sys.version
'3.7.7 (default, Mar 10 2020, 15:43:33) \n[Clang 11.0.0 (clang-1100.0.33.17)]'

pip list | grep tensorflow
tensorflow               2.2.0
tensorflow-estimator     2.2.0

4 model building APIs in TensorFlow

TensorFlow provides two major APIs for building models, and four more subdivided APIs.

--Symbolic (declarative) API - Sequential API - Functional API --Primitive API (** 1. How to write X series. Not recommended **) --Instructive (model subclassing) API - Subclassing API

First, I will briefly introduce the major classifications.

Symbolic (declarative) API

It is a writing method that declares (≒ compiles) the shape of the model before executing learning.

Models written with this API cannot change shape during training. Therefore, some dynamically changing models (such as Tree-RNN) cannot be implemented. Instead, you can check the model shape before giving the data to the model.

Imperative (model subclassing) API

Unlike the symbolic API, it is an imperative (≒ intuitive) writing style that does not declare.

It was first adopted by Chainer, a deep learning library that originated in Japan (Preferred Networks), and PyTorch also adopted it. You can implement the model as if you were writing a class in Python, which makes it easy to customize, such as changing layers and extending. Instead, the program cannot recognize what the model looks like until the data is given once.

Next, I will introduce a concrete writing method with a simple example.

Sequential API As the name implies, it is an API that implements a model by adding layers to Sequential. This style is often used in Keras and TensorFlow tutorials, so you may have seen it once.

As shown below, after instantiating an empty tensorflow.keras.Sequential class, add layers with the ʻadd method, and give layers as a list to the arguments of the tensorflow.keras.Sequential` class. It is common to instantiate.

import tensorflow as tf
from tensorflow.keras import layers

def sequential_vgg16_a(input_shape, output_size):
    model = tf.keras.Sequential()
    model.add(layers.Conv2D(64, 3, 1, padding="same", batch_input_shape=input_shape))
    model.add(layers.BatchNormalization())
    # ...(Omission)...
    model.add(layers.Dense(output_size, activation="softmax"))    
    return model

def sequential_vgg16_b(input_shape, output_size):
    model = tf.keras.Sequential([
        layers.Conv2D(64, 3, 1, padding="same", batch_input_shape=input_shape),
        layers.BatchNormalization(),
        # ...(Omission)...
        layers.Dense(output_size, activation="softmax")
    ]
    return model

It only supports methods to add layers, so you can't write complex networks with multiple inputs, intermediate features, multiple outputs, or conditional branches. You can use this API to implement a simple network (like VGG) that just goes through the layers in sequence.

Functional API An API that implements complex models that cannot be described by the Sequential API.

First, instantiate tensorflow.keras.layers.Input and pass it to the first layer. After that, the data flow of the model is defined by passing the output of one layer to the next layer. Finally, you can build your model by giving the resulting output and the first input as arguments to tensorflow.keras.Model.

from tensorflow.keras import layers, Model

def functional_vgg16(input_shape, output_size, batch_norm=False):
    inputs = layers.Input(batch_input_shape=input_shape)

    x = layers.Conv2D(64, 3, 1, padding="same")(inputs)
    if batch_norm:
        x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)
    # ...(Omission)...
    outputs = layers.Dense(output_size, activation="softmax")(x)

    return Model(inputs=inputs, outputs=outputs)

In the above example, the presence or absence of the Batch Normalization layer is switched by the value of the variable batch_norm. If you need a flexible definition that changes the shape of the model depending on the conditions, you need the Functional API instead of the Sequential API.

Note that there's a seemingly weird way of writing parentheses followed by parentheses, but this isn't TensorFlow-specific and is commonly used in Python, and the following two represent the same thing:

#How to write 1
x = layers.BatchNormalization()(x)

#How to write 2
layer = layers.BatchNormalization()
x = layer(x)

Primitive API This API was mainly used in TensorFlow 1.X series. ** 2.X series is currently deprecated. ** **

The Sequential API and Functional API mentioned above could define the model by describing the flow of data passing through the model, but the Primitive API declaratively describes the entire processing flow including other computational processing. To do.

There is not much merit to learn this writing method from now on, so I will omit the explanation, but if you are training using tensorflow.Session, it corresponds to this writing method.

import tensorflow as tf
sess = tf.Session()

Subclassing API An API that became available with the update to TensorFlow 2.X. It's written in much the same way as Chainer and PyTorch, and it's intuitive and easy to customize because you can implement your model as if you were writing a class in Python.

First, create a class by inheriting tensorflow.keras.Model. Then build the model by implementing the __init __ and call methods.

The __init__ method in the class calls the __init__ method of the parent class and registers the layer you want to learn. ** Layer weights not listed here are not trained by default. ** **

The call method in the class describes the forward propagation of layers. (Similar to Chainer's __call__ and PyTorch's forward.)

from tensorflow.keras import layers, Model


class VGG16(Model):
    def __init__(self, output_size=1000):
        super().__init__()
        self.layers_ = [
            layers.Conv2D(64, 3, 1, padding="same"),
            layers.BatchNormalization(),
            # ...(Omission)...
            layers.Dense(output_size, activation="softmax"),
        ]
    def call(self, inputs):
        for layer in self.layers_:
            inputs = layer(inputs)
        return inputs

It looks a bit verbose compared to other writing styles, but you can see that you can implement the model as you would normally write a class.

Note that the super method that initializes the parent class also has a pattern that gives an argument, but this is written in consideration of the 2nd series Python, and in the 3rd series Python, the same processing is performed without an argument.

from tensorflow.keras import Model


#How to write Python 3 series
class VGG16_PY3(Model):
    def __init__(self, output_size=1000):
        super().__init__()

#How to write Python 2 series
class VGG16_PY2(Model):
    def __init__(self, output_size=1000):
        super().__init__(VGG16_PY2, self)

Model building API review

This concludes the explanation of how to build a model. In summary, I think you can use it properly as follows.

--If you want to easily write a model that only goes through layers unilaterally ** Sequential API ** --If you want to write a complicated model so that you can check the shape properly before executing training ** Functional API ** --If you want to write in Chainer or PyTorch style, or if you want to write a dynamic model ** Subclassing API **

Two training methods in TensorFlow

There are two ways to train:

--built-in training --Custom training

Since it seems that there is no official name, the above name is used for convenience in this article.

built-in training

This is a method of training using the built-in function of tensorflow.keras.Model.

Many of you may know it because it is also used in the tutorials of Keras and TensorFlow. Scikit-learn also uses this method, although it is a different library.

First, instantiate the model implemented by the above API (tensorflow.keras.Model, or an object that inherits it).

This instance has a compile method and a fit method as built-in functions.

Execute this compile method to register the loss function, optimization function, and merit index. Then train by executing the fit method.

import tensorflow as tf

(train_images, train_labels), _ = tf.keras.datasets.cifar10.load_data()

#I'm using a trained model for illustration
model = tf.keras.applications.VGG16()

model.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

model.fit(train_images, train_labels)

You can now perform the training.

Batch size specification, epoch number, callback function registration, evaluation with validation data, etc. can be registered as keyword arguments of fit method, so some customization is possible.

In many cases, this may be sufficient, but cases that do not fit into this frame (for example, cases where multiple models such as GAN are trained at the same time) need to be described in the custom training described later.

Custom training

It doesn't have a special API, it's just a normal way to train in a Python for loop.

First, instantiate the model implemented by the above API (tensorflow.keras.Model, or an object that inherits it).

Next, in addition to defining the loss function and optimization function, batch the dataset. After that, epoch and batch are rotated in a for loop.

In the for loop, first describe the forward propagation process in the tf.GradientTape scope. We then call the gradient method to calculate the gradient and the ʻapply_gradients` method to update the weights according to the optimization function.

import tensorflow as tf

batch_size = 32
epochs = 10

(train_images, train_labels), _ = tf.keras.datasets.cifar10.load_data()
#I'm using a trained model for illustration

model = tf.keras.applications.VGG16()

buffer_size = len(train_images)
train_ds = tf.data.Dataset.from_tensor_slices((train_images, train_labels))
train_ds = train_ds.shuffle(buffer_size=buffer_size).batch(batch_size)

criterion = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.keras.optimizers.Adam()

for epoch in range(epochs):
    for x, y_true in train_ds:
        with tf.GradientTape() as tape:
            y_pred = model(x, training=True)
            loss = criterion(y_true=y_true, y_pred=y_pred)
        gradients = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(gradients, model.trainable_variables))

You can now perform the training.

In the above example, the validation data is not evaluated at all and the output to TensorBoard is not performed at all, but since the for loop is normally rotated, you can add processing as you like.

On the other hand, since the amount of description is inevitably large, it becomes a little difficult to guarantee the quality of the source code. Chainer and PyTorch can be written in almost the same way (although there are minor differences).

Looking back on the training method

This concludes the explanation of the training method. In summary, I think you can use it properly as follows.

--If you do not need to perform any special processing during training, ** built-in training ** for general training methods --If you are not addicted to the built-in frame, if you want to add various processes during training and try and error, or if you want to write by hand ** Custom training **

Overview of VGG16 / ResNet 50

I think there are some parts that cannot be understood from the explanation alone, so I will deepen my understanding through implementation.

Let's start with a brief introduction to the two models.

What is VGG16

It is a high-performance model with a very simple structure that has 13 layers of 3x3 Convolution and 3 layers of fully connected layers. It is used to extract image features in various image recognition tasks. The original paper has over 37,000 citations and is very well known.

It can be implemented with the Sequential API, Functional API, and Subclassing API.

The original paper is here. https://arxiv.org/abs/1409.1556

What is ResNet50

This is a multi-layer model with a Residual mechanism (49 layers for Convolution and 1 layer for fully connected layers). As of 2020, this ResNet variant still has the highest image classification accuracy, and is also a high-performance model. Like the VGG16, it is used to extract image features in various image recognition tasks. The original paper has more than 45,000 citations (about 10 times that of BERT), which is also very famous.

It cannot be implemented by the Sequential API alone. It can be implemented with Functional API and Subclassing API.

The original paper is here. https://arxiv.org/abs/1512.03385

Implementation of VGG16

Let's implement it in each writing style.

VGG16 Sequential API

I don't have to think about it, so I write it normally.

from tensorflow.keras import layers, Sequential


def sequential_vgg16(input_shape, output_size):
    params = {
        "padding": "same",
        "use_bias": True,
        "kernel_initializer": "he_normal",
    }
    model = Sequential()
    model.add(layers.Conv2D(64, 3, 1, **params, batch_input_shape=input_shape))
    model.add(layers.BatchNormalization())
    model.add(layers.ReLU())
    model.add(layers.Conv2D(64, 3, 1, **params))
    model.add(layers.BatchNormalization())
    model.add(layers.ReLU())
    model.add(layers.MaxPool2D(2, padding="same"))
    model.add(layers.Conv2D(128, 3, 1, **params))
    model.add(layers.BatchNormalization())
    model.add(layers.ReLU())
    model.add(layers.Conv2D(128, 3, 1, **params))
    model.add(layers.BatchNormalization())
    model.add(layers.ReLU())
    model.add(layers.MaxPool2D(2, padding="same"))
    model.add(layers.Conv2D(256, 3, 1, **params))
    model.add(layers.BatchNormalization())
    model.add(layers.ReLU())
    model.add(layers.Conv2D(256, 3, 1, **params))
    model.add(layers.BatchNormalization())
    model.add(layers.ReLU())
    model.add(layers.Conv2D(256, 3, 1, **params))
    model.add(layers.BatchNormalization())
    model.add(layers.ReLU())
    model.add(layers.MaxPool2D(2, padding="same"))
    model.add(layers.Conv2D(512, 3, 1, **params))
    model.add(layers.BatchNormalization())
    model.add(layers.ReLU())
    model.add(layers.Conv2D(512, 3, 1, **params))
    model.add(layers.BatchNormalization())
    model.add(layers.ReLU())
    model.add(layers.Conv2D(512, 3, 1, **params))
    model.add(layers.BatchNormalization())
    model.add(layers.ReLU())
    model.add(layers.MaxPool2D(2, padding="same"))
    model.add(layers.Conv2D(512, 3, 1, **params))
    model.add(layers.BatchNormalization())
    model.add(layers.ReLU())
    model.add(layers.Conv2D(512, 3, 1, **params))
    model.add(layers.BatchNormalization())
    model.add(layers.ReLU())
    model.add(layers.Conv2D(512, 3, 1, **params))
    model.add(layers.BatchNormalization())
    model.add(layers.ReLU())
    model.add(layers.MaxPool2D(2, padding="same"))
    model.add(layers.Flatten())
    model.add(layers.Dense(4096))
    model.add(layers.Dense(4096))
    model.add(layers.Dense(output_size, activation="softmax"))
    return model

It's pretty simple to write, but it turns out to be hard to see because of the many layers. For example, you may not notice if ReLU is missing somewhere. Also, for example, if you want to eliminate Batch Normalization, you need to comment out line by line, which is poorly reusable and customizable.

VGG16 Functional API It is more flexible to write than the Sequential API. This time, let's make the reused layer group (Convolution --Batch Normalization --ReLU) a function.

from tensorflow.keras import layers, Model


def functional_cbr(x, filters, kernel_size, strides):
    params = {
        "filters": filters,
        "kernel_size": kernel_size,
        "strides": strides,
        "padding": "same",
        "use_bias": True,
        "kernel_initializer": "he_normal",
    }

    x = layers.Conv2D(**params)(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)
    return x


def functional_vgg16(input_shape, output_size):
    inputs = layers.Input(batch_input_shape=input_shape)
    x = functional_cbr(inputs, 64, 3, 1)
    x = functional_cbr(x, 64, 3, 1)
    x = layers.MaxPool2D(2, padding="same")(x)
    x = functional_cbr(x, 128, 3, 1)
    x = functional_cbr(x, 128, 3, 1)
    x = layers.MaxPool2D(2, padding="same").__call__(x)  #You can write like this
    x = functional_cbr(x, 256, 3, 1)
    x = functional_cbr(x, 256, 3, 1)
    x = functional_cbr(x, 256, 3, 1)
    x = layers.MaxPool2D(2, padding="same").call(x)  #You can write like this
    x = functional_cbr(x, 512, 3, 1)
    x = functional_cbr(x, 512, 3, 1)
    x = functional_cbr(x, 512, 3, 1)
    x = layers.MaxPool2D(2, padding="same")(x)
    x = functional_cbr(x, 512, 3, 1)
    x = functional_cbr(x, 512, 3, 1)
    x = functional_cbr(x, 512, 3, 1)
    x = layers.MaxPool2D(2, padding="same")(x)
    x = layers.Flatten()(x)
    x = layers.Dense(4096)(x)
    x = layers.Dense(4096)(x)
    outputs = layers.Dense(output_size, activation="softmax")(x)
    return Model(inputs=inputs, outputs=outputs)

I was able to write quite clearly. If you want to get rid of BatchNormalization or change ReLU to LeaklyReLU, you only need to fix a few lines.

VGG16 Subclassing API Let's write a group of layers (Convolution --BatchNormalization --ReLU) that is reused as a class like the Functional API.

from tensorflow.keras import layers, Model


class CBR(Model):
    def __init__(self, filters, kernel_size, strides):
        super().__init__()

        params = {
            "filters": filters,
            "kernel_size": kernel_size,
            "strides": strides,
            "padding": "same",
            "use_bias": True,
            "kernel_initializer": "he_normal",
        }

        self.layers_ = [
            layers.Conv2D(**params),
            layers.BatchNormalization(),
            layers.ReLU()
        ]

    def call(self, inputs):
        for layer in self.layers_:
            inputs = layer(inputs)
        return inputs


class VGG16(Model):
    def __init__(self, output_size=1000):
        super().__init__()
        self.layers_ = [
            CBR(64, 3, 1),
            CBR(64, 3, 1),
            layers.MaxPool2D(2, padding="same"),
            CBR(128, 3, 1),
            CBR(128, 3, 1),
            layers.MaxPool2D(2, padding="same"),
            CBR(256, 3, 1),
            CBR(256, 3, 1),
            CBR(256, 3, 1),
            layers.MaxPool2D(2, padding="same"),
            CBR(512, 3, 1),
            CBR(512, 3, 1),
            CBR(512, 3, 1),
            layers.MaxPool2D(2, padding="same"),
            CBR(512, 3, 1),
            CBR(512, 3, 1),
            CBR(512, 3, 1),
            layers.MaxPool2D(2, padding="same"),
            layers.Flatten(),
            layers.Dense(4096),
            layers.Dense(4096),
            layers.Dense(output_size, activation="softmax"),
        ]

    def call(self, inputs):
        for layer in self.layers_:
            inputs = layer(inputs)
        return inputs

It's easier to understand intuitively than the Functional API because __init__ is responsible for defining the model and call is responsible for calling the model, but the code is longer. Another point is that the imperative Subclassing API does not require an input shape (no ʻinput_shape` is required as an argument) when generating a model.

VGG16 implementation review

I intended to make it as easy to compare as possible, but how was it?

This implementation uses Batch Normalization between the Convolution layers and He initialization for weight initialization, but these techniques have not yet been published when the original paper was submitted. So there was no Batch Normalization layer and Grolot initialization was used to initialize the weights. Therefore, in the original paper, a transfer learning-like learning method is adopted in which a 7-layer model is trained and then layers are gradually added in order to avoid gradient disappearance.

It would be interesting to try out what would happen if you removed the Batch Normalization layer, what would happen if you changed the weight initialization method, etc. to better understand the above implementation.

ResNet50 implementation

Then implement ResNet50. Since the Sequential API cannot be written by itself, write it with the Functional API and Subclassing API.

ResNet50 Functional API Functionalize and implement the reused Residual mechanism.

from tensorflow.keras import layers, Model


def functional_bottleneck_residual(x, in_ch, out_ch, strides=1):
    params = {
        "padding": "same",
        "kernel_initializer": "he_normal",
        "use_bias": True,
    }
    inter_ch = out_ch // 4
    h1 = layers.Conv2D(inter_ch, kernel_size=1, strides=strides, **params)(x)
    h1 = layers.BatchNormalization()(h1)
    h1 = layers.ReLU()(h1)
    h1 = layers.Conv2D(inter_ch, kernel_size=3, strides=1, **params)(h1)
    h1 = layers.BatchNormalization()(h1)
    h1 = layers.ReLU()(h1)
    h1 = layers.Conv2D(out_ch, kernel_size=1, strides=1, **params)(h1)
    h1 = layers.BatchNormalization()(h1)

    if in_ch != out_ch:
        h2 = layers.Conv2D(out_ch, kernel_size=1, strides=strides, **params)(x)
        h2 = layers.BatchNormalization()(h2)
    else:
        h2 = x

    h = layers.Add()([h1, h2])
    h = layers.ReLU()(h)
    return h


def functional_resnet50(input_shape, output_size):
    inputs = layers.Input(batch_input_shape=input_shape)
    x = layers.Conv2D(64, 7, 2, padding="same", kernel_initializer="he_normal")(inputs)
    x = layers.BatchNormalization()(x)
    x = layers.MaxPool2D(pool_size=3, strides=2, padding="same")(x)

    x = functional_bottleneck_residual(x, 64, 256)
    x = functional_bottleneck_residual(x, 256, 256)
    x = functional_bottleneck_residual(x, 256, 256)

    x = functional_bottleneck_residual(x, 256, 512, 2)
    x = functional_bottleneck_residual(x, 512, 512)
    x = functional_bottleneck_residual(x, 512, 512)
    x = functional_bottleneck_residual(x, 512, 512)

    x = functional_bottleneck_residual(x, 512, 1024, 2)
    x = functional_bottleneck_residual(x, 1024, 1024)
    x = functional_bottleneck_residual(x, 1024, 1024)
    x = functional_bottleneck_residual(x, 1024, 1024)
    x = functional_bottleneck_residual(x, 1024, 1024)
    x = functional_bottleneck_residual(x, 1024, 1024)

    x = functional_bottleneck_residual(x, 1024, 2048, 2)
    x = functional_bottleneck_residual(x, 2048, 2048)
    x = functional_bottleneck_residual(x, 2048, 2048)

    x = layers.GlobalAveragePooling2D()(x)
    outputs = layers.Dense(
        output_size, activation="softmax", kernel_initializer="he_normal"
    )(x)
    return Model(inputs=inputs, outputs=outputs)

Within the functional_bottleneck_residual method, h1, h2, and h appear. In this way, a model in which the data flow branches in the middle cannot be described by the Sequential API.

In addition, h2 does nothing if the number of input / output channels is the same, and performs a process (Projection) to adjust the number of channels if they are different. Such conditional branching cannot be described by the Sequential API.

Once you've created this method, all you have to do is write it in sequence.

ResNet50 Subclassing API

Classify and implement a Residual mechanism that is reused like the Functional API.

from tensorflow import layers, Model


class BottleneckResidual(Model):
    """ResNet's Bottleneck Residual Module.
By reducing the ch dimension with 1x1 conv on the first layer
Reduced the amount of calculation of 3x3 conv in the second layer
Restore the dimensions of the ch output with 1x1 conv on the third layer.
It is called bottleneck because it reduces the dimension of the second layer 3x3 conv, which requires a lot of calculation..
    """

    def __init__(self, in_ch, out_ch, strides=1):
        super().__init__()

        self.projection = in_ch != out_ch
        inter_ch = out_ch // 4
        params = {
            "padding": "same",
            "kernel_initializer": "he_normal",
            "use_bias": True,
        }

        self.common_layers = [
            layers.Conv2D(inter_ch, kernel_size=1, strides=strides, **params),
            layers.BatchNormalization(),
            layers.ReLU(),
            layers.Conv2D(inter_ch, kernel_size=3, strides=1, **params),
            layers.BatchNormalization(),
            layers.ReLU(),
            layers.Conv2D(out_ch, kernel_size=1, strides=1, **params),
            layers.BatchNormalization(),
        ]

        if self.projection:
            self.projection_layers = [
                layers.Conv2D(out_ch, kernel_size=1, strides=strides, **params),
                layers.BatchNormalization(),
            ]

        self.concat_layers = [layers.Add(), layers.ReLU()]

    def call(self, inputs):
        h1 = inputs
        h2 = inputs

        for layer in self.common_layers:
            h1 = layer(h1)

        if self.projection:
            for layer in self.projection_layers:
                h2 = layer(h2)

        outputs = [h1, h2]
        for layer in self.concat_layers:
            outputs = layer(outputs)
        return outputs


class ResNet50(Model):
    """ResNet50.
The element is
    conv * 1
    resblock(conv * 3) * 3
    resblock(conv * 3) * 4
    resblock(conv * 3) * 6
    resblock(conv * 3) * 3
    dense * 1
Consists of, conv * 49 + dense *50 layers of 1.
    """

    def __init__(self, output_size=1000):
        super().__init__()

        self.layers_ = [
            layers.Conv2D(64, 7, 2, padding="same", kernel_initializer="he_normal"),
            layers.BatchNormalization(),
            layers.MaxPool2D(pool_size=3, strides=2, padding="same"),
            BottleneckResidual(64, 256),
            BottleneckResidual(256, 256),
            BottleneckResidual(256, 256),
            BottleneckResidual(256, 512, 2),
            BottleneckResidual(512, 512),
            BottleneckResidual(512, 512),
            BottleneckResidual(512, 512),
            BottleneckResidual(512, 1024, 2),
            BottleneckResidual(1024, 1024),
            BottleneckResidual(1024, 1024),
            BottleneckResidual(1024, 1024),
            BottleneckResidual(1024, 1024),
            BottleneckResidual(1024, 1024),
            BottleneckResidual(1024, 2048, 2),
            BottleneckResidual(2048, 2048),
            BottleneckResidual(2048, 2048),
            layers.GlobalAveragePooling2D(),
            layers.Dense(
                output_size, activation="softmax", kernel_initializer="he_normal"
            ),
        ]

    def call(self, inputs):
        for layer in self.layers_:
            inputs = layer(inputs)
        return inputs

It's not that much different from the Functional API. The __init__ layer is written to put the layers together in a list, but you can write this area freely as long as it is registered in the variable of the class.

ResNet50 Implementation Review

We introduced ResNet50 as a model that cannot be implemented by the Sequential API alone. To be honest, there is no big difference, so I think it's okay to use the Functional API and Subclassing API according to your preference.

Training implementation

Finally, let's compare the training loop implementations.

Since it would be quite long to put all the source code, the method is partially cut out to src.utils. It's not that complicated, so it would be helpful if you could read it while complementing it.

For the time being, all the sources are in the following repositories, so please have a look if you are interested. https://github.com/Anieca/deep-learning-models

built-in training implementation

Let's specify some options such as test data accuracy calculation and log output for TensorBoard.


import os
import tensorflow as tf

from src.utils import load_dataset, load_model, get_args, get_current_time


def builtin_train(args):
    # 1. load dataset and model
    (train_images, train_labels), (test_images, test_labels) = load_dataset(args.data)
    input_shape = train_images[: args.batch_size, :, :, :].shape
    output_size = max(train_labels) + 1
    model = load_model(args.arch, input_shape=input_shape, output_size=output_size)
    model.summary()

    # 2. set tensorboard cofigs
    logdir = os.path.join(args.logdir, get_current_time())
    tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=logdir)

    # 3. loss, optimizer, metrics setting
    model.compile(
        optimizer=tf.keras.optimizers.Adam(),
        loss="sparse_categorical_crossentropy",
        metrics=["accuracy"],
    )

    # 4. dataset config (and validation, callback config)
    fit_params = {}
    fit_params["batch_size"] = args.batch_size
    fit_params["epochs"] = args.max_epoch
    if args.steps_per_epoch:
        fit_params["steps_per_epoch"] = args.steps_per_epoch
    fit_params["verbose"] = 1
    fit_params["callbacks"] = [tensorboard_callback]
    fit_params["validation_data"] = (test_images, test_labels)

    # 5. start train and test
    model.fit(train_images, train_labels, **fit_params)

It's pretty simple to write.

There are many other callback functions, so if you are interested, please read the documentation. https://www.tensorflow.org/api_docs/python/tf/keras/callbacks

Custom training implementation

Let's implement the same process as the built-in training above by ourselves.

import os
import tensorflow as tf

from src.utils import load_dataset, load_model, get_args, get_current_time


def custom_train(args):
    # 1. load dataset and model
    (train_images, train_labels), (test_images, test_labels) = load_dataset(args.data)
    input_shape = train_images[: args.batch_size, :, :, :].shape
    output_size = max(train_labels) + 1
    model = load_model(args.arch, input_shape=input_shape, output_size=output_size)
    model.summary()

    # 2. set tensorboard configs
    logdir = os.path.join(args.logdir, get_current_time())
    train_writer = tf.summary.create_file_writer(os.path.join(logdir, "train"))
    test_writer = tf.summary.create_file_writer(os.path.join(logdir, "test"))

    # 3. loss, optimizer, metrics setting
    criterion = tf.keras.losses.SparseCategoricalCrossentropy()
    optimizer = tf.keras.optimizers.Adam()
    train_loss_avg = tf.keras.metrics.Mean()
    train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy()
    test_loss_avg = tf.keras.metrics.Mean()
    test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy()

    # 4. dataset config
    buffer_size = len(train_images)
    train_ds = tf.data.Dataset.from_tensor_slices((train_images, train_labels))
    train_ds = train_ds.shuffle(buffer_size=buffer_size).batch(args.batch_size)
    test_ds = tf.data.Dataset.from_tensor_slices((test_images, test_labels))
    test_ds = test_ds.batch(args.batch_size)

    # 5. start train and test
    for epoch in range(args.max_epoch):
        # 5.1. initialize metrics
        train_loss_avg.reset_states()
        train_accuracy.reset_states()
        test_loss_avg.reset_states()
        test_loss_avg.reset_states()

        # 5.2. initialize progress bar
        train_pbar = tf.keras.utils.Progbar(args.steps_per_epoch)
        test_pbar = tf.keras.utils.Progbar(args.steps_per_epoch)

        # 5.3. start train
        for i, (x, y_true) in enumerate(train_ds):
            if args.steps_per_epoch and i >= args.steps_per_epoch:
                break
            # 5.3.1. forward
            with tf.GradientTape() as tape:
                y_pred = model(x, training=True)
                loss = criterion(y_true=y_true, y_pred=y_pred)
            # 5.3.2. calculate gradients from `tape` and backward
            gradients = tape.gradient(loss, model.trainable_variables)
            optimizer.apply_gradients(zip(gradients, model.trainable_variables))

            # 5.3.3. update metrics and progress bar
            train_loss_avg(loss)
            train_accuracy(y_true, y_pred)
            train_pbar.update(
                i + 1,
                [
                    ("avg_loss", train_loss_avg.result()),
                    ("accuracy", train_accuracy.result()),
                ],
            )

        # 5.4. start test
        for i, (x, y_true) in enumerate(test_ds):
            if args.steps_per_epoch and i >= args.steps_per_epoch:
                break
            # 5.4.1. forward
            y_pred = model(x)
            loss = criterion(y_true, y_pred)

            # 5.4.2. update metrics and progress bar
            test_loss_avg(loss)
            test_accuracy(y_true, y_pred)
            test_pbar.update(
                i + 1,
                [
                    ("avg_test_loss", test_loss_avg.result()),
                    ("test_accuracy", test_accuracy.result()),
                ],
            )

        # 5.5. write metrics to tensorboard
        with train_writer.as_default():
            tf.summary.scalar("Loss", train_loss_avg.result(), step=epoch)
            tf.summary.scalar("Acc", train_accuracy.result(), step=epoch)
        with test_writer.as_default():
            tf.summary.scalar("Loss", test_loss_avg.result(), step=epoch)
            tf.summary.scalar("Acc", test_accuracy.result(), step=epoch)

It doesn't change that much until the start of training, but the amount of description in the training loop (comment 5.) is quite large.

Training implementation retrospective

Managing utilities such as managing TensorBoard output and creating progress bars yourself can be costly, but built-in is pretty easy to use.

If you want to write a process that is not provided in built-in, you need to write it in custom training, but if not, it seems better to use built-in.

At the end

that's all. Thank you for your hard work.

I introduced various writing methods of TensorFlow 2 system with implementation.

I intend to write flatly without giving too much superiority or inferiority to each writing style.

When writing by yourself, I think that you should write according to the situation and taste, but when you are looking for the source code, you will come across various writing styles, so I think it is good to understand all the writing styles somehow.

We hope you find it useful.