[PYTHON] Performance comparison by incorporating a skip structure in SegNet (CaDIS: a Cataract Dataset)

1.First of all

This article is a sequel to the article that was semantically segmented using SegNet [^ 2] for CaDIS: a Cataract Dataset [^ 1] that was performed previously. is. This time, we will compare the performance of U-Net [^ 3] with a skip structure built into the network of previous.

All code

2. Environment

--PC specs

3. Dataset & data split

Please refer to this because it is the same as Last time.

4. Model building & learning

This time, we will implement U-Net [^ 3] with a skip structure in SegNet [^ 2] of previous and compare the performance. U-Net connects the output before the encoder Max Pooling with the output after Up Sampling of the decoder and inputs it to the Convolution layer in the network as shown in the figure below. This makes it possible to recover information that has been lost by compressing the image with Max Pooling.

image.png

This time, we will introduce only the skip structure to the network structure of previous. First, branch the network as follows.

x = Conv2D(32, (3, 3), padding='same', kernel_initializer='he_normal')(inputs)
x = BatchNormalization()(x)
x1 = Activation('relu')(x)
x = MaxPool2D(pool_size=(2, 2))(x1)

Branch the output by leaving x1 unupdated. Next, concatenate the branched outputs as follows.

x = UpSampling2D(size=(2, 2))(x)
x = concatenate([x, x1], axis=-1)
x = Conv2D(64, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)

You can concatenate input tensors by using concatenate (). This time we combine x1 = (height, width, ch1) and x = (height, width, ch2) to get the output tensor of(height, width, ch1 + ch2). The network to which this is applied in each Max Pooling and Up Sampling is shown below.

U-Net code
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, MaxPool2D, UpSampling2D, concatenate
from tensorflow.keras.layers import Conv2D, BatchNormalization, Activation

# U-Net(8 layers of encoder, 8 layers of decoder)Build
def cnn(input_shape, classes):
    #Input image size must be a multiple of 32
    assert input_shape[0]%32 == 0, 'Input size must be a multiple of 32.'
    assert input_shape[1]%32 == 0, 'Input size must be a multiple of 32.'

    #encoder
    ##Input layer
    inputs = Input(shape=(input_shape[0], input_shape[1], 3))

    ##1st layer
    x = Conv2D(32, (3, 3), padding='same', kernel_initializer='he_normal')(inputs)
    x = BatchNormalization()(x)
    x1 = Activation('relu')(x)
    x = MaxPool2D(pool_size=(2, 2))(x1)

    ##2nd layer
    x = Conv2D(64, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x2 = Activation('relu')(x)
    x = MaxPool2D(pool_size=(2, 2))(x2)

    ##3rd layer
    x = Conv2D(128, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x3 = Activation('relu')(x)
    x = MaxPool2D(pool_size=(2, 2))(x3)

    ##4th layer
    x = Conv2D(256, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x4 = Activation('relu')(x)
    x = MaxPool2D(pool_size=(2, 2))(x4)

    ##5th and 6th layers
    x = Conv2D(512, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Conv2D(512, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x5 = Activation('relu')(x)
    x = MaxPool2D(pool_size=(2, 2))(x5)

    ##7th and 8th layers
    x = Conv2D(1024, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Conv2D(1024, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)

    #Decoder
    ##1st layer
    x = Conv2D(1024, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)

    ##2nd and 3rd layers
    x = UpSampling2D(size=(2, 2))(x)
    x = concatenate([x, x5], axis=-1)
    x = Conv2D(512, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Conv2D(512, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)

    ##4th layer
    x = UpSampling2D(size=(2, 2))(x)
    x = concatenate([x, x4], axis=-1)
    x = Conv2D(256, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)

    ##5th layer
    x = UpSampling2D(size=(2, 2))(x)
    x = concatenate([x, x3], axis=-1)
    x = Conv2D(128, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)

    ##6th layer
    x = UpSampling2D(size=(2, 2))(x)
    x = concatenate([x, x2], axis=-1)
    x = Conv2D(64, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)

    ##7th and 8th layers
    x = UpSampling2D(size=(2, 2))(x)
    x = concatenate([x, x1], axis=-1)
    x = Conv2D(64, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = Conv2D(classes, (1, 1), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    outputs = Activation('softmax')(x)


    return Model(inputs=inputs, outputs=outputs)


#Network construction
model = cnn(image_size, classes)

Furthermore, I think that the description of the network part will become redundant due to deepening in the future, so I will function the encoder Conv + BN + Relu + MaxPool and the decoder Conv + BN + Relu + Upsampl. The following is a slightly refreshing version of the function.

import dataclasses
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, MaxPool2D, UpSampling2D, concatenate
from tensorflow.keras.layers import Conv2D, BatchNormalization, Activation

# U-Net(8 layers of encoder, 8 layers of decoder)Build
@dataclasses.dataclass
class CNN:
    input_shape: tuple #Input image size
    classes: int #Number of classification classes

    def __post_init__(self):
        #Input image size must be a multiple of 32
        assert self.input_shape[0]%32 == 0, 'Input size must be a multiple of 32.'
        assert self.input_shape[1]%32 == 0, 'Input size must be a multiple of 32.'


    #Encoder block
    @staticmethod
    def encoder(x, blocks, filters, pooling):
        for i in range(blocks):
            x = Conv2D(filters, (3, 3), padding='same', kernel_initializer='he_normal')(x)
            x = BatchNormalization()(x)
            x = Activation('relu')(x)

        if pooling:
            return MaxPool2D(pool_size=(2, 2))(x), x
        else:
            return x


    #Decoder block
    @staticmethod
    def decoder(x1, x2, blocks, filters):
        x = UpSampling2D(size=(2, 2))(x1)
        x = concatenate([x, x2], axis=-1)

        for i in range(blocks):
            x = Conv2D(filters, (3, 3), padding='same', kernel_initializer='he_normal')(x)
            x = BatchNormalization()(x)
            x = Activation('relu')(x)

        return x


    def create(self):
        #encoder
        inputs = Input(shape=(self.input_shape[0], self.input_shape[1], 3)) #Input layer
        x, x1 = self.encoder(inputs, blocks=1, filters=32, pooling=True) #1st layer
        x, x2 = self.encoder(x, blocks=1, filters=64, pooling=True) #2nd layer
        x, x3 = self.encoder(x, blocks=1, filters=128, pooling=True) #3rd layer
        x, x4 = self.encoder(x, blocks=1, filters=256, pooling=True) #4th layer
        x, x5 = self.encoder(x, blocks=2, filters=512, pooling=True) #5th and 6th layers
        x = self.encoder(x, blocks=2, filters=1024, pooling=False) #7th and 8th layers

        #Decoder
        x = self.encoder(x, blocks=1, filters=1024, pooling=False) #1st layer
        x = self.decoder(x, x5, blocks=2, filters=512) #2nd and 3rd layers
        x = self.decoder(x, x4, blocks=1, filters=256) #4th layer
        x = self.decoder(x, x3, blocks=1, filters=128) #5th layer
        x = self.decoder(x, x2, blocks=1, filters=64) #6th layer
        ##7th and 8th layers
        x = UpSampling2D(size=(2, 2))(x)
        x = concatenate([x, x1], axis=-1)
        x = Conv2D(64, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
        x = Conv2D(self.classes, (1, 1), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
        outputs = Activation('softmax')(x)


        return Model(inputs=inputs, outputs=outputs)


#Network construction
model = CNN(input_shape=image_size, classes=classes).create()

The learning method and parameters are the same as Last time. The learning results are as follows. history.jpg

5. Evaluation

The evaluation is done by the average IoU for each class and the mean IoU which is the average of them, as in the previous [https://qiita.com/burokoron/items/c730e607607c925c6fd1).

Below are the evaluation results. From the table, you can see that the previous was 0% for many classes, which are slightly inferrable. In addition, mean IoU was 18.5%. Last time SegNet [^ 2] was 15.0%, so we were able to improve the accuracy by 3.5%.

Index Class average IoU(SegNet)[%] average IoU(U-Net)[%]
0 Pupil 85.3 86.5
1 Surgical Tape 53.3 57.1
2 Hand 6.57 6.96
3 Eye Retractors 21.9 53.6
4 Iris 74.4 76.0
5 Eyelid 0 0
6 Skin 49.7 48.4
7 Cornea 88.0 88.5
8 Hydro. Cannula 0 31.8
9 Visco. Cannula 0 4.36
10 Cap. Cystotome 0 3.71
11 Rycroft Cannula 0 4.37
12 Bonn Forceps 3.58 7.94
13 Primary Knife 5.35 10.3
14 Phaco. Handpiece 0.0781 12.3
15 Lens Injector 16.4 15.8
16 A/I Handpiece 16.4 20.5
17 Secondary Knife 6.08 11.8
18 Micromanipulator 0 8.99
19 A/I Handpiece Handle 6.49 8.16
20 Cap. Forceps 0 0.337
21 Rycroft Cannula Handle 0 0.00863
22 Phaco. Handpiece Handle 0 4.26
23 Cap. Cystotome Handle 0 0.407
24 Secondary Knife Handle 2.49 3.82
25 Lens Injector Handle 0 0
26 Water Sprayer
27 Suture Needle 0 0
28 Needle Holder
29 Charleux Cannula 0 0
30 Vannas Scissors
31 Primary Knife Handle 0 0
32 Viter. Handpiece 0 0
33 Mendez Ring
34 Biomarker
35 Marker

7. Summary

In this article, we implemented U-Net [^ 3] that incorporates the skip structure into SegNet [^ 2] of previous. For CaDIS: a Cataract Dataset [^ 1], we compared the performance with mean IoU and confirmed a 3.5% improvement in accuracy. The highest accuracy of the dissertation [^ 1] is 34.16%, up to 52.66% of PSPNet. Based on this result, I will continue to incorporate the latest methods such as network structure and data expansion method, and aim for the same or better performance.

Recommended Posts

Performance comparison by incorporating a skip structure in SegNet (CaDIS: a Cataract Dataset)
Image segmentation with CaDIS: a Cataract Dataset