Wenn Sie GoogleNet oder VGG16 verwenden, ist die Leistung der Objekterkennung zwar gut, die Verwendung von Mobiltelefonen ist jedoch schwierig, da sie nicht über viel Speicher und Rechengeschwindigkeit verfügen. Als eine der Lösungen für diese Probleme scheint Google ein Netzwerk MobileNet [^ 1] erstellt zu haben, das einen Kompromiss zwischen Berechnungszeit, Speicher und Leistung eingehen kann. Deshalb habe ich es untersucht.
Herkömmlicherweise wird ein Faltungsfilter für die Anzahl der Kanäle (Ausgabe) nach Kernelgröße x Kernelgröße x Anzahl der Kanäle (Eingabe) vorbereitet, aber die Faltung wird durchgeführt.
In MobileNet wird ein Faltungsfilter der Kernelgröße x Kernelgröße x1 für die Anzahl der Kanäle (Eingabe) vorbereitet und eine Faltung durchgeführt.
Bereiten Sie als Nächstes einen Faltungsfilter für 1x1x-Kanal (Eingang) für die Anzahl der Kanäle (Ausgang) vor und falten Sie ihn. ――Dies realisiert eine Verarbeitung ähnlich der herkömmlichen Faltung.
Zitiert aus MobileNets [^ 1]
Struktur, wenn $ \ alpha = 0.5 $ in CIFAR10.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 32, 32, 3) 0
_________________________________________________________________
conv1 (Conv2D) (None, 16, 16, 16) 432
_________________________________________________________________
conv1_bn (BatchNormalization (None, 16, 16, 16) 64
_________________________________________________________________
conv1_relu (Activation) (None, 16, 16, 16) 0
_________________________________________________________________
conv_dw_1 (DepthwiseConv2D) (None, 16, 16, 16) 144
_________________________________________________________________
conv_dw_1_bn (BatchNormaliza (None, 16, 16, 16) 64
_________________________________________________________________
conv_dw_1_relu (Activation) (None, 16, 16, 16) 0
_________________________________________________________________
conv_pw_1 (Conv2D) (None, 16, 16, 32) 512
_________________________________________________________________
conv_pw_1_bn (BatchNormaliza (None, 16, 16, 32) 128
_________________________________________________________________
conv_pw_1_relu (Activation) (None, 16, 16, 32) 0
_________________________________________________________________
conv_dw_2 (DepthwiseConv2D) (None, 8, 8, 32) 288
_________________________________________________________________
conv_dw_2_bn (BatchNormaliza (None, 8, 8, 32) 128
_________________________________________________________________
conv_dw_2_relu (Activation) (None, 8, 8, 32) 0
_________________________________________________________________
conv_pw_2 (Conv2D) (None, 8, 8, 64) 2048
_________________________________________________________________
conv_pw_2_bn (BatchNormaliza (None, 8, 8, 64) 256
_________________________________________________________________
conv_pw_2_relu (Activation) (None, 8, 8, 64) 0
_________________________________________________________________
...
_________________________________________________________________
conv_dw_13 (DepthwiseConv2D) (None, 1, 1, 512) 4608
_________________________________________________________________
conv_dw_13_bn (BatchNormaliz (None, 1, 1, 512) 2048
_________________________________________________________________
conv_dw_13_relu (Activation) (None, 1, 1, 512) 0
_________________________________________________________________
conv_pw_13 (Conv2D) (None, 1, 1, 512) 262144
_________________________________________________________________
conv_pw_13_bn (BatchNormaliz (None, 1, 1, 512) 2048
_________________________________________________________________
conv_pw_13_relu (Activation) (None, 1, 1, 512) 0
_________________________________________________________________
global_average_pooling2d_1 ( (None, 512) 0
_________________________________________________________________
reshape_1 (Reshape) (None, 1, 1, 512) 0
_________________________________________________________________
dropout (Dropout) (None, 1, 1, 512) 0
_________________________________________________________________
conv_preds (Conv2D) (None, 1, 1, 10) 5130
_________________________________________________________________
act_softmax (Activation) (None, 1, 1, 10) 0
_________________________________________________________________
reshape_2 (Reshape) (None, 10) 0
=================================================================
Total params: 834,666
Trainable params: 823,722
Non-trainable params: 10,944
_________________________________________________________________
import keras
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.applications import MobileNet
batch_size = 32
classes = 10
epochs = 200
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
Y_train = keras.utils.to_categorical(y_train, classes)
Y_test = keras.utils.to_categorical(y_test, classes)
img_input = keras.layers.Input(shape=(32, 32, 3))
model = MobileNet(input_tensor=img_input, alpha=0.5, weights=None, classes=classes)
model.compile(loss='categorical_crossentropy', optimizer="nadam", metrics=['accuracy'])
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255
datagen = ImageDataGenerator(
featurewise_center=False, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=False, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
rotation_range=0, # randomly rotate images in the range (degrees, 0 to 180)
width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
horizontal_flip=True, # randomly flip images
vertical_flip=False) # randomly flip images
datagen.fit(X_train)
model.fit_generator(datagen.flow(X_train, Y_train, batch_size=batch_size),
steps_per_epoch=X_train.shape[0] // batch_size,
epochs=epochs,
validation_data=(X_test, Y_test))
----------------------------------------------------------------------------
Width Multiplier (alpha) | ImageNet Acc | Multiply-Adds (M) | Params (M)
----------------------------------------------------------------------------
| 1.0 MobileNet-224 | 70.6 % | 529 | 4.2 |
| 0.75 MobileNet-224 | 68.4 % | 325 | 2.6 |
| 0.50 MobileNet-224 | 63.7 % | 149 | 1.3 |
| 0.25 MobileNet-224 | 50.6 % | 41 | 0.5 |
----------------------------------------------------------------------------
------------------------------------------------------------------------
Resolution | ImageNet Acc | Multiply-Adds (M) | Params (M)
------------------------------------------------------------------------
| 1.0 MobileNet-224 | 70.6 % | 529 | 4.2 |
| 1.0 MobileNet-192 | 69.1 % | 529 | 4.2 |
| 1.0 MobileNet-160 | 67.2 % | 529 | 4.2 |
| 1.0 MobileNet-128 | 64.4 % | 529 | 4.2 |
------------------------------------------------------------------------
――MobileNet ist ein Netzwerk, das einen Kompromiss zwischen Leistung und Berechnungsbetrag eingehen kann.
References
Recommended Posts