This article is a sequel to the article that was semantically segmented using SegNet [^ 2] for CaDIS: a Cataract Dataset [^ 1] that was performed previously. is. This time, we will compare the performance of U-Net [^ 3] with a skip structure built into the network of previous.
--PC specs
Please refer to this because it is the same as Last time.
This time, we will implement U-Net [^ 3] with a skip structure in SegNet [^ 2] of previous and compare the performance. U-Net connects the output before the encoder Max Pooling with the output after Up Sampling of the decoder and inputs it to the Convolution layer in the network as shown in the figure below. This makes it possible to recover information that has been lost by compressing the image with Max Pooling.
This time, we will introduce only the skip structure to the network structure of previous. First, branch the network as follows.
x = Conv2D(32, (3, 3), padding='same', kernel_initializer='he_normal')(inputs)
x = BatchNormalization()(x)
x1 = Activation('relu')(x)
x = MaxPool2D(pool_size=(2, 2))(x1)
Branch the output by leaving x1
unupdated. Next, concatenate the branched outputs as follows.
x = UpSampling2D(size=(2, 2))(x)
x = concatenate([x, x1], axis=-1)
x = Conv2D(64, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
You can concatenate input tensors by using concatenate ()
. This time we combine x1 = (height, width, ch1)
and x = (height, width, ch2)
to get the output tensor of(height, width, ch1 + ch2)
. The network to which this is applied in each Max Pooling and Up Sampling is shown below.
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, MaxPool2D, UpSampling2D, concatenate
from tensorflow.keras.layers import Conv2D, BatchNormalization, Activation
# U-Net(8 layers of encoder, 8 layers of decoder)Build
def cnn(input_shape, classes):
#Input image size must be a multiple of 32
assert input_shape[0]%32 == 0, 'Input size must be a multiple of 32.'
assert input_shape[1]%32 == 0, 'Input size must be a multiple of 32.'
#encoder
##Input layer
inputs = Input(shape=(input_shape[0], input_shape[1], 3))
##1st layer
x = Conv2D(32, (3, 3), padding='same', kernel_initializer='he_normal')(inputs)
x = BatchNormalization()(x)
x1 = Activation('relu')(x)
x = MaxPool2D(pool_size=(2, 2))(x1)
##2nd layer
x = Conv2D(64, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
x = BatchNormalization()(x)
x2 = Activation('relu')(x)
x = MaxPool2D(pool_size=(2, 2))(x2)
##3rd layer
x = Conv2D(128, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
x = BatchNormalization()(x)
x3 = Activation('relu')(x)
x = MaxPool2D(pool_size=(2, 2))(x3)
##4th layer
x = Conv2D(256, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
x = BatchNormalization()(x)
x4 = Activation('relu')(x)
x = MaxPool2D(pool_size=(2, 2))(x4)
##5th and 6th layers
x = Conv2D(512, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Conv2D(512, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
x = BatchNormalization()(x)
x5 = Activation('relu')(x)
x = MaxPool2D(pool_size=(2, 2))(x5)
##7th and 8th layers
x = Conv2D(1024, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Conv2D(1024, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
#Decoder
##1st layer
x = Conv2D(1024, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
##2nd and 3rd layers
x = UpSampling2D(size=(2, 2))(x)
x = concatenate([x, x5], axis=-1)
x = Conv2D(512, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Conv2D(512, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
##4th layer
x = UpSampling2D(size=(2, 2))(x)
x = concatenate([x, x4], axis=-1)
x = Conv2D(256, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
##5th layer
x = UpSampling2D(size=(2, 2))(x)
x = concatenate([x, x3], axis=-1)
x = Conv2D(128, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
##6th layer
x = UpSampling2D(size=(2, 2))(x)
x = concatenate([x, x2], axis=-1)
x = Conv2D(64, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
##7th and 8th layers
x = UpSampling2D(size=(2, 2))(x)
x = concatenate([x, x1], axis=-1)
x = Conv2D(64, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
x = Conv2D(classes, (1, 1), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
outputs = Activation('softmax')(x)
return Model(inputs=inputs, outputs=outputs)
#Network construction
model = cnn(image_size, classes)
Furthermore, I think that the description of the network part will become redundant due to deepening in the future, so I will function the encoder Conv + BN + Relu + MaxPool
and the decoder Conv + BN + Relu + Upsampl
. The following is a slightly refreshing version of the function.
import dataclasses
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, MaxPool2D, UpSampling2D, concatenate
from tensorflow.keras.layers import Conv2D, BatchNormalization, Activation
# U-Net(8 layers of encoder, 8 layers of decoder)Build
@dataclasses.dataclass
class CNN:
input_shape: tuple #Input image size
classes: int #Number of classification classes
def __post_init__(self):
#Input image size must be a multiple of 32
assert self.input_shape[0]%32 == 0, 'Input size must be a multiple of 32.'
assert self.input_shape[1]%32 == 0, 'Input size must be a multiple of 32.'
#Encoder block
@staticmethod
def encoder(x, blocks, filters, pooling):
for i in range(blocks):
x = Conv2D(filters, (3, 3), padding='same', kernel_initializer='he_normal')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
if pooling:
return MaxPool2D(pool_size=(2, 2))(x), x
else:
return x
#Decoder block
@staticmethod
def decoder(x1, x2, blocks, filters):
x = UpSampling2D(size=(2, 2))(x1)
x = concatenate([x, x2], axis=-1)
for i in range(blocks):
x = Conv2D(filters, (3, 3), padding='same', kernel_initializer='he_normal')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
return x
def create(self):
#encoder
inputs = Input(shape=(self.input_shape[0], self.input_shape[1], 3)) #Input layer
x, x1 = self.encoder(inputs, blocks=1, filters=32, pooling=True) #1st layer
x, x2 = self.encoder(x, blocks=1, filters=64, pooling=True) #2nd layer
x, x3 = self.encoder(x, blocks=1, filters=128, pooling=True) #3rd layer
x, x4 = self.encoder(x, blocks=1, filters=256, pooling=True) #4th layer
x, x5 = self.encoder(x, blocks=2, filters=512, pooling=True) #5th and 6th layers
x = self.encoder(x, blocks=2, filters=1024, pooling=False) #7th and 8th layers
#Decoder
x = self.encoder(x, blocks=1, filters=1024, pooling=False) #1st layer
x = self.decoder(x, x5, blocks=2, filters=512) #2nd and 3rd layers
x = self.decoder(x, x4, blocks=1, filters=256) #4th layer
x = self.decoder(x, x3, blocks=1, filters=128) #5th layer
x = self.decoder(x, x2, blocks=1, filters=64) #6th layer
##7th and 8th layers
x = UpSampling2D(size=(2, 2))(x)
x = concatenate([x, x1], axis=-1)
x = Conv2D(64, (3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
x = Conv2D(self.classes, (1, 1), strides=(1, 1), padding='same', kernel_initializer='he_normal')(x)
outputs = Activation('softmax')(x)
return Model(inputs=inputs, outputs=outputs)
#Network construction
model = CNN(input_shape=image_size, classes=classes).create()
The learning method and parameters are the same as Last time. The learning results are as follows.
The evaluation is done by the average IoU for each class and the mean IoU which is the average of them, as in the previous [https://qiita.com/burokoron/items/c730e607607c925c6fd1).
Below are the evaluation results. From the table, you can see that the previous was 0% for many classes, which are slightly inferrable. In addition, mean IoU was 18.5%. Last time SegNet [^ 2] was 15.0%, so we were able to improve the accuracy by 3.5%.
Index | Class | average IoU(SegNet)[%] | average IoU(U-Net)[%] |
---|---|---|---|
0 | Pupil | 85.3 | 86.5 |
1 | Surgical Tape | 53.3 | 57.1 |
2 | Hand | 6.57 | 6.96 |
3 | Eye Retractors | 21.9 | 53.6 |
4 | Iris | 74.4 | 76.0 |
5 | Eyelid | 0 | 0 |
6 | Skin | 49.7 | 48.4 |
7 | Cornea | 88.0 | 88.5 |
8 | Hydro. Cannula | 0 | 31.8 |
9 | Visco. Cannula | 0 | 4.36 |
10 | Cap. Cystotome | 0 | 3.71 |
11 | Rycroft Cannula | 0 | 4.37 |
12 | Bonn Forceps | 3.58 | 7.94 |
13 | Primary Knife | 5.35 | 10.3 |
14 | Phaco. Handpiece | 0.0781 | 12.3 |
15 | Lens Injector | 16.4 | 15.8 |
16 | A/I Handpiece | 16.4 | 20.5 |
17 | Secondary Knife | 6.08 | 11.8 |
18 | Micromanipulator | 0 | 8.99 |
19 | A/I Handpiece Handle | 6.49 | 8.16 |
20 | Cap. Forceps | 0 | 0.337 |
21 | Rycroft Cannula Handle | 0 | 0.00863 |
22 | Phaco. Handpiece Handle | 0 | 4.26 |
23 | Cap. Cystotome Handle | 0 | 0.407 |
24 | Secondary Knife Handle | 2.49 | 3.82 |
25 | Lens Injector Handle | 0 | 0 |
26 | Water Sprayer | ─ | ─ |
27 | Suture Needle | 0 | 0 |
28 | Needle Holder | ─ | ─ |
29 | Charleux Cannula | 0 | 0 |
30 | Vannas Scissors | ─ | ─ |
31 | Primary Knife Handle | 0 | 0 |
32 | Viter. Handpiece | 0 | 0 |
33 | Mendez Ring | ─ | ─ |
34 | Biomarker | ─ | ─ |
35 | Marker | ─ | ─ |
In this article, we implemented U-Net [^ 3] that incorporates the skip structure into SegNet [^ 2] of previous. For CaDIS: a Cataract Dataset [^ 1], we compared the performance with mean IoU and confirmed a 3.5% improvement in accuracy. The highest accuracy of the dissertation [^ 1] is 34.16%, up to 52.66% of PSPNet. Based on this result, I will continue to incorporate the latest methods such as network structure and data expansion method, and aim for the same or better performance.