1.First of all

This article is a sequel to the article that was semantically segmented using SegNet [^ 2] for CaDIS: a Cataract Dataset [^ 1] that was performed previously. is. This time, we will compare the performance of U-Net [^ 3] with a skip structure built into the network of previous.

All code

2. Environment

--PC specs

CPU: Intel Core i9-9900K
RAM: 16GB
GPU: NVIDIA GeForce GTX 1080 Ti
Library
Python: 3.7.4
numpy: 1.16.5
matplotlib: 3.1.1
opencv: 3.4.1
pandas: 0.25.1
tqdm: 4.31.1
scikit-learn: 0.21.3
tensorflow-gpu: 2.0.0

3. Dataset & data split

Please refer to this because it is the same as Last time.

4. Model building & learning

This time, we will implement U-Net [^ 3] with a skip structure in SegNet [^ 2] of previous and compare the performance. U-Net connects the output before the encoder Max Pooling with the output after Up Sampling of the decoder and inputs it to the Convolution layer in the network as shown in the figure below. This makes it possible to recover information that has been lost by compressing the image with Max Pooling.

This time, we will introduce only the skip structure to the network structure of previous. First, branch the network as follows.

x = Conv2D(32, (3, 3), padding='same', kernel_initializer='he_normal')(inputs)
x = BatchNormalization()(x)
x1 = Activation('relu')(x)
x = MaxPool2D(pool_size=(2, 2))(x1)

Branch the output by leaving x1 unupdated. Next, concatenate the branched outputs as follows.

x = UpSampling2D(size=(2, 2))(x)
x = concatenate([x, x1], axis=-1)
x = Conv2D(64, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)

You can concatenate input tensors by using concatenate (). This time we combine x1 = (height, width, ch1) and x = (height, width, ch2) to get the output tensor of(height, width, ch1 + ch2). The network to which this is applied in each Max Pooling and Up Sampling is shown below.

U-Net code

from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, MaxPool2D, UpSampling2D, concatenate
from tensorflow.keras.layers import Conv2D, BatchNormalization, Activation

# U-Net(8 layers of encoder, 8 layers of decoder)Build
def cnn(input_shape, classes):
    #Input image size must be a multiple of 32
    assert input_shape[0]%32 == 0, 'Input size must be a multiple of 32.'
    assert input_shape[1]%32 == 0, 'Input size must be a multiple of 32.'

    #encoder
    ##Input layer
    inputs = Input(shape=(input_shape[0], input_shape[1], 3))

    ##1st layer
    x = Conv2D(32, (3, 3), padding='same', kernel_initializer='he_normal')(inputs)
    x = BatchNormalization()(x)
    x1 = Activation('relu')(x)
    x = MaxPool2D(pool_size=(2, 2))(x1)

    ##2nd layer
    x = Conv2D(64, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x2 = Activation('relu')(x)
    x = MaxPool2D(pool_size=(2, 2))(x2)

    ##3rd layer
    x = Conv2D(128, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x3 = Activation('relu')(x)
    x = MaxPool2D(pool_size=(2, 2))(x3)

    ##4th layer
    x = Conv2D(256, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x4 = Activation('relu')(x)
    x = MaxPool2D(pool_size=(2, 2))(x4)

    ##5th and 6th layers
    x = Conv2D(512, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Conv2D(512, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x5 = Activation('relu')(x)
    x = MaxPool2D(pool_size=(2, 2))(x5)

    ##7th and 8th layers
    x = Conv2D(1024, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Conv2D(1024, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)

    #Decoder
    ##1st layer
    x = Conv2D(1024, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)

    ##2nd and 3rd layers
    x = UpSampling2D(size=(2, 2))(x)
    x = concatenate([x, x5], axis=-1)
    x = Conv2D(512, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Conv2D(512, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)

    ##4th layer
    x = UpSampling2D(size=(2, 2))(x)
    x = concatenate([x, x4], axis=-1)
    x = Conv2D(256, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)

    ##5th layer
    x = UpSampling2D(size=(2, 2))(x)
    x = concatenate([x, x3], axis=-1)
    x = Conv2D(128, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)

    ##6th layer
    x = UpSampling2D(size=(2, 2))(x)
    x = concatenate([x, x2], axis=-1)
    x = Conv2D(64, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)

    ##7th and 8th layers
    x = UpSampling2D(size=(2, 2))(x)
    x = concatenate([x, x1], axis=-1)
    x = Conv2D(64, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    x = Conv2D(classes, (1, 1), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
    outputs = Activation('softmax')(x)


    return Model(inputs=inputs, outputs=outputs)


#Network construction
model = cnn(image_size, classes)

Furthermore, I think that the description of the network part will become redundant due to deepening in the future, so I will function the encoder Conv + BN + Relu + MaxPool and the decoder Conv + BN + Relu + Upsampl. The following is a slightly refreshing version of the function.

import dataclasses
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, MaxPool2D, UpSampling2D, concatenate
from tensorflow.keras.layers import Conv2D, BatchNormalization, Activation

# U-Net(8 layers of encoder, 8 layers of decoder)Build
@dataclasses.dataclass
class CNN:
    input_shape: tuple #Input image size
    classes: int #Number of classification classes

    def __post_init__(self):
        #Input image size must be a multiple of 32
        assert self.input_shape[0]%32 == 0, 'Input size must be a multiple of 32.'
        assert self.input_shape[1]%32 == 0, 'Input size must be a multiple of 32.'


    #Encoder block
    @staticmethod
    def encoder(x, blocks, filters, pooling):
        for i in range(blocks):
            x = Conv2D(filters, (3, 3), padding='same', kernel_initializer='he_normal')(x)
            x = BatchNormalization()(x)
            x = Activation('relu')(x)

        if pooling:
            return MaxPool2D(pool_size=(2, 2))(x), x
        else:
            return x


    #Decoder block
    @staticmethod
    def decoder(x1, x2, blocks, filters):
        x = UpSampling2D(size=(2, 2))(x1)
        x = concatenate([x, x2], axis=-1)

        for i in range(blocks):
            x = Conv2D(filters, (3, 3), padding='same', kernel_initializer='he_normal')(x)
            x = BatchNormalization()(x)
            x = Activation('relu')(x)

        return x


    def create(self):
        #encoder
        inputs = Input(shape=(self.input_shape[0], self.input_shape[1], 3)) #Input layer
        x, x1 = self.encoder(inputs, blocks=1, filters=32, pooling=True) #1st layer
        x, x2 = self.encoder(x, blocks=1, filters=64, pooling=True) #2nd layer
        x, x3 = self.encoder(x, blocks=1, filters=128, pooling=True) #3rd layer
        x, x4 = self.encoder(x, blocks=1, filters=256, pooling=True) #4th layer
        x, x5 = self.encoder(x, blocks=2, filters=512, pooling=True) #5th and 6th layers
        x = self.encoder(x, blocks=2, filters=1024, pooling=False) #7th and 8th layers

        #Decoder
        x = self.encoder(x, blocks=1, filters=1024, pooling=False) #1st layer
        x = self.decoder(x, x5, blocks=2, filters=512) #2nd and 3rd layers
        x = self.decoder(x, x4, blocks=1, filters=256) #4th layer
        x = self.decoder(x, x3, blocks=1, filters=128) #5th layer
        x = self.decoder(x, x2, blocks=1, filters=64) #6th layer
        ##7th and 8th layers
        x = UpSampling2D(size=(2, 2))(x)
        x = concatenate([x, x1], axis=-1)
        x = Conv2D(64, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
        x = Conv2D(self.classes, (1, 1), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
        outputs = Activation('softmax')(x)


        return Model(inputs=inputs, outputs=outputs)


#Network construction
model = CNN(input_shape=image_size, classes=classes).create()

The learning method and parameters are the same as Last time. The learning results are as follows.

5. Evaluation

The evaluation is done by the average IoU for each class and the mean IoU which is the average of them, as in the previous [https://qiita.com/burokoron/items/c730e607607c925c6fd1).

Below are the evaluation results. From the table, you can see that the previous was 0% for many classes, which are slightly inferrable. In addition, mean IoU was 18.5%. Last time SegNet [^ 2] was 15.0%, so we were able to improve the accuracy by 3.5%.

Index	Class	average IoU(SegNet)[%]	average IoU(U-Net)[%]
0	Pupil	85.3	86.5
1	Surgical Tape	53.3	57.1
2	Hand	6.57	6.96
3	Eye Retractors	21.9	53.6
4	Iris	74.4	76.0
5	Eyelid	0	0
6	Skin	49.7	48.4
7	Cornea	88.0	88.5
8	Hydro. Cannula	0	31.8
9	Visco. Cannula	0	4.36
10	Cap. Cystotome	0	3.71
11	Rycroft Cannula	0	4.37
12	Bonn Forceps	3.58	7.94
13	Primary Knife	5.35	10.3
14	Phaco. Handpiece	0.0781	12.3
15	Lens Injector	16.4	15.8
16	A/I Handpiece	16.4	20.5
17	Secondary Knife	6.08	11.8
18	Micromanipulator	0	8.99
19	A/I Handpiece Handle	6.49	8.16
20	Cap. Forceps	0	0.337
21	Rycroft Cannula Handle	0	0.00863
22	Phaco. Handpiece Handle	0	4.26
23	Cap. Cystotome Handle	0	0.407
24	Secondary Knife Handle	2.49	3.82
25	Lens Injector Handle	0	0
26	Water Sprayer	─	─
27	Suture Needle	0	0
28	Needle Holder	─	─
29	Charleux Cannula	0	0
30	Vannas Scissors	─	─
31	Primary Knife Handle	0	0
32	Viter. Handpiece	0	0
33	Mendez Ring	─	─
34	Biomarker	─	─
35	Marker	─	─

7. Summary

In this article, we implemented U-Net [^ 3] that incorporates the skip structure into SegNet [^ 2] of previous. For CaDIS: a Cataract Dataset [^ 1], we compared the performance with mean IoU and confirmed a 3.5% improvement in accuracy. The highest accuracy of the dissertation [^ 1] is 34.16%, up to 52.66% of PSPNet. Based on this result, I will continue to incorporate the latest methods such as network structure and data expansion method, and aim for the same or better performance.