[PYTHON] Deep learning learned by implementation (segmentation) ~ Implementation of SegNet ~

environment

tensorflow == 2.2.0 keras == 2.3.1 (Default version of Google Colab as of 202.6.10)

code

You can find all the code on github. https://github.com/milky1210/Segnet The code in the article is an excerpt, so if you want to actually run it, please download the code.

Summarize the content of SegNet's paper

スクリーンショット 2020-06-09 13.18.49.png

Abstract

In the problem of inferring what is reflected for each pixel of the image called SEMANTIC segmentation by deep learning, it is accurate to restore the feature map that was lowered in resolution by Pooling etc. to the original dimension. We propose a model that maps to the boundary line. スクリーンショット 2020-06-09 13.29.43.png

Differences from other studies

SegNet performs UpSampling after reducing the resolution in the convolution layer and the pooling layer like a normal FCN, but when increasing the resolution, it uses a technique called pooling indice to prevent the boundary from becoming blurred. There is. スクリーンショット 2020-06-09 13.34.45.png Here, Encode and Decode inherit the shape of the VGG16 model (a model famous for image classification). Pooling indices スクリーンショット 2020-06-09 13.39.25.png As shown in this figure, remember where Max was when Max Pooling was performed, and transfer each feature map to that position during UpSampling.

Performance comparison using VOC12

What is VOC12

It is a data set that supports problems such as image recognition, image detection, and segmentation, which are also used in SegNet papers for performance verification. You can download it from here.

When downloaded, JPEGImages / and SegmentationObject / are included in VOCdevkit / VOC2012 /, and training and verification are performed using JPEGImage as an input image and SegmentationObject as an output image.

JPEGImages / ~ .jpg and Segmentation Object / ~ .png are supported in each directory. 22 classes are classified including background and boundaries.

Implementation

In this article, we will only cover the definition of the model, the definition of the loss function, and training. In addition, training and verification will be conducted at a resolution of 64x64.

Model definition

First, as a comparison target, SegNet (Encoder-decoder) without pooling indice is modeled as VGG16 as follows.

def build_FCN():
  ffc = 32
  inputs = layers.Input(shape=(64,64,3))
  for i in range(2):
    x = layers.Conv2D(ffc,kernel_size=3,padding="same")(inputs)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)
  x = layers.MaxPooling2D((2,2))(x)
  for i in range(2):
    x = layers.Conv2D(ffc*2,kernel_size=3,padding="same")(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)
  x = layers.MaxPooling2D((2,2))(x)
  for i in range(3):
    x = layers.Conv2D(ffc*4,kernel_size=3,padding="same")(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)
  x = layers.MaxPooling2D((2,2))(x)
  for i in range(3):
    x = layers.Conv2D(ffc*8,kernel_size=3,padding="same")(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)
  x = layers.MaxPooling2D((2,2))(x)
  for i in range(3):
    x = layers.Conv2D(ffc*8,kernel_size=3,padding="same")(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)
  x = layers.UpSampling2D((2,2))(x)
  for i in range(3):
    x = layers.Conv2D(ffc*4,kernel_size=3,padding="same")(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)
  x = layers.UpSampling2D((2,2))(x)
  for i in range(3):
    x = layers.Conv2D(ffc*2,kernel_size=3,padding="same")(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)
  x = layers.UpSampling2D((2,2))(x)
  for i in range(2):
    x = layers.Conv2D(ffc*2,kernel_size=3,padding="same")(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)
  x = layers.UpSampling2D((2,2))(x)
  for i in range(2):
    x = layers.Conv2D(ffc,kernel_size=3,padding="same")(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)
  x = layers.Conv2D(22,kernel_size=3,padding="same",activation="softmax")(x)
  return models.Model(inputs,x)

When it is modeled after vgg16, it has such a structure, and it becomes a network with 24 convolution layers. Note that MaxPooling2D is used to make the image smaller and UpSampling2D is used to make the image larger. Next, let's look at the difference between Segnet and this model. First, Segnet holds the information corresponding to ArgMaxPooling2D in that layer as follows before performing MaxPooling2D. This function is not in Keras and uses tensorflow's. Therefore, it is necessary to create the original Keras Layer. If you define the function as below, it will be a layer that runs on Keras.

class MaxPoolingWithArgmax2D(Layer):
    def __init__(self):
        super(MaxPoolingWithArgmax2D,self).__init__()
    def call(self,inputs):
        output,argmax = tf.nn.max_pool_with_argmax(inputs,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')
        argmax = K.cast(argmax,K.floatx())
        return [output,argmax]
    def compute_output_shape(self,input_shape):
        ratio = (1,2,2,1)
        output_shape = [dim//ratio[idx] if dim is not None else None for idx, dim in enumerate(input_shape)]
        output_shape = tuple(output_shape)
        return [output_shape,output_shape]

Define a layer to return to the location where it was argmax the next time you perform Up Sampling (this is quite long)

class MaxUnpooling2D(Layer):
    def __init__(self):
        super(MaxUnpooling2D,self).__init__()
    def call(self,inputs,output_shape = None):
        updates, mask = inputs[0],inputs[1]
        with tf.variable_scope(self.name):
            mask = K.cast(mask, 'int32')
            input_shape = tf.shape(updates, out_type='int32')
            #  calculation new shape
            if output_shape is None:
                output_shape = (input_shape[0],input_shape[1]*2,input_shape[2]*2,input_shape[3])
            self.output_shape1 = output_shape
            # calculation indices for batch, height, width and feature maps
            one_like_mask = K.ones_like(mask, dtype='int32')
            batch_shape = K.concatenate([[input_shape[0]], [1 ], [1], [1]],axis=0)
            batch_range = K.reshape(tf.range(output_shape[0], dtype='int32'),shape=batch_shape)
            b = one_like_mask * batch_range
            y = mask // (output_shape[2] * output_shape[3])
            x = (mask // output_shape[3]) % output_shape[2]
            feature_range = tf.range(output_shape[3], dtype='int32')
            f = one_like_mask * feature_range

            # transpose indices & reshape update values to one dimension
            updates_size = tf.size(updates)
            indices = K.transpose(K.reshape(
                K.stack([b, y, x, f]),
                [4, updates_size]))
            values = K.reshape(updates, [updates_size])
            ret = tf.scatter_nd(indices, values, output_shape)
            return ret
    def compute_output_shape(self,input_shape):
        shape = input_shape[1]
        return (shape[0],shape[1]*2,shape[2]*2,shape[3])

If Segnet is defined using the layer defined by these, it will be as follows.

def build_Segnet():
    ffc = 32
    inputs = layers.Input(shape=(64,64,3))
    for i in range(2):
      x = layers.Conv2D(ffc,kernel_size=3,padding="same")(inputs)
      x = layers.BatchNormalization()(x)
      x = layers.ReLU()(x)
    x,x1 = MaxPoolingWithArgmax2D()(x)
    for i in range(2):
      x = layers.Conv2D(ffc*2,kernel_size=3,padding="same")(x)
      x = layers.BatchNormalization()(x)
      x = layers.ReLU()(x)
    x,x2 = MaxPoolingWithArgmax2D()(x)
    for i in range(3):
      x = layers.Conv2D(ffc*4,kernel_size=3,padding="same")(x)
      x = layers.BatchNormalization()(x)
      x = layers.ReLU()(x)
    x,x3 = MaxPoolingWithArgmax2D()(x)
    for i in range(3):
      x = layers.Conv2D(ffc*8,kernel_size=3,padding="same")(x)
      x = layers.BatchNormalization()(x)
      x = layers.ReLU()(x)
    x,x4 = MaxPoolingWithArgmax2D()(x)
    for i in range(3):
      x = layers.Conv2D(ffc*8,kernel_size=3,padding="same")(x)
      x = layers.BatchNormalization()(x)
      x = layers.ReLU()(x)
    x = layers.Dropout(rate = 0.5)(x)
    x = MaxUnpooling2D()([x,x4])
    for i in range(3):
      x = layers.Conv2D(ffc*4,kernel_size=3,padding="same")(x)
      x = layers.BatchNormalization()(x)
      x = layers.ReLU()(x)
    x = MaxUnpooling2D()([x,x3])
    for i in range(3):
      x = layers.Conv2D(ffc*2,kernel_size=3,padding="same")(x)
      x = layers.BatchNormalization()(x)
      x = layers.ReLU()(x)
    x = MaxUnpooling2D()([x,x2])
    for i in range(2):
      x = layers.Conv2D(ffc,kernel_size=3,padding="same")(x)
      x = layers.BatchNormalization()(x)
      x = layers.ReLU()(x)
    x = MaxUnpooling2D()([x,x1])
    for i in range(2):
      x = layers.Conv2D(ffc,kernel_size=3,padding="same")(x)
      x = layers.BatchNormalization()(x)
      x = layers.ReLU()(x)
    x = layers.Conv2D(22,kernel_size=3,padding="same",activation="softmax")(x)
    return models.Model(inputs,x)

Loss function and optimization

This time, the loss function uses the cross entropy of each pixel. In addition, Adam (lr = 0.001, beta_1 = 0.9, beta_2 = 0.999) was used for optimization.

result

We confirmed how much the result changes depending on the presence or absence of pooling index. The loss in the training and the average of the correct answer rate at each pixel were graphed. First, the result of the model without Pooling Indice acc (1).png

loss (1).png

The verification data had a correct answer rate of about 78%. Next, I will post the results of SegNet.

acc.png loss.png It was stable with a correct answer rate of about 82%, and we were able to see the behavior as per the paper.

Output image example

Input from the left, without Pooling Indice, all test data with SegNet, GT

It was found that holding the Pooling Index can be expected to improve accuracy considerably.

Recommended Posts

Deep learning learned by implementation (segmentation) ~ Implementation of SegNet ~
Deep learning learned by implementation 1 (regression)
Deep learning learned by implementation ~ Anomaly detection (unsupervised learning) ~
Deep reinforcement learning 2 Implementation of reinforcement learning
Othello-From the tic-tac-toe of "Implementation Deep Learning" (3)
Othello-From the tic-tac-toe of "Implementation Deep Learning" (2)
Chainer and deep learning learned by function approximation
Deep learning 1 Practice of deep learning
Parallel learning of deep learning by Keras and Kubernetes
Deep Learning from scratch The theory and implementation of deep learning learned with Python Chapter 3
Othello ~ From the tic-tac-toe of "Implementation Deep Learning" (4) [End]
Deep running 2 Tuning of deep learning
[For beginners of deep learning] Implementation of simple binary classification by full coupling using Keras
[Deep Learning from scratch] Implementation of Momentum method and AdaGrad method
Rank learning using neural network (Implementation of RankNet by Chainer)
Basic understanding of depth estimation by mono camera (Deep Learning)
Summary of basic implementation by PyTorch
Deep learning image recognition 2 model implementation
Try to make a blackjack strategy by reinforcement learning ((1) Implementation of blackjack)
[Learning memo] Basics of class by python
Meaning of deep learning models and parameters
Conditional branching of Python learned by chemoinformatics
Qiskit: Implementation of Quantum Circuit Learning (QCL)
Produce beautiful sea slugs by deep learning
Implementation of SVM by stochastic gradient descent
Try deep learning of genomics with Kipoi
Machine learning algorithm (implementation of multi-class classification)
Visualize the effects of deep learning / regularization
Deep Understanding Object Detection by Deep Learning by Keras
Sentiment analysis of tweets with deep learning
[Reinforcement learning] Easy high-speed implementation of Ape-X!
Deep Learning
Learning record of reading "Deep Learning from scratch"
Deep Learning from scratch-Chapter 4 tips on deep learning theory and implementation learned in Python
Python vs Ruby "Deep Learning from scratch" Chapter 4 Implementation of loss function
Judgment of igneous rock by machine learning ②
Python vs Ruby "Deep Learning from scratch" Chapter 3 Implementation of 3-layer neural network
Build a python environment to learn the theory and implementation of deep learning
Python: Deep learning in natural language processing: Implementation of answer sentence selection system
Implementation of a model that predicts the exchange rate (dollar-yen rate) by machine learning
The story of doing deep learning with TPU
Classification of guitar images by machine learning Part 1
Deep learning / error back propagation of sigmoid function
A memorandum of studying and implementing deep learning
Basic understanding of stereo depth estimation (Deep Learning)
99.78% accuracy with deep learning by recognizing handwritten hiragana
Video frame interpolation by deep learning Part1 [Python]
Overview of machine learning techniques learned from scikit-learn
Improvement of performance metrix by two-step learning model
[Translation] scikit-learn 0.18 Tutorial Introduction of machine learning by scikit-learn
Classification of guitar images by machine learning Part 2
First deep learning in C #-Imitating implementation in Python-
Deep Learning Memorandum
Start Deep learning
Python Deep Learning
Deep learning × Python
Stock investment by deep reinforcement learning (policy gradient method) (1)
Abnormal value detection by unsupervised learning: Mahalanobis distance (implementation)
Let's summarize various implementation codes of GCN by compounds
Count the number of parameters in the deep learning model
Overview of DNC (Differentiable Neural Computers) + Implementation by Chainer