[PYTHON] [Umfrage] MobileNets: Effiziente Faltungs-Neuronale Netze für Mobile Vision-Anwendungen

Wenn Sie GoogleNet oder VGG16 verwenden, ist die Leistung der Objekterkennung zwar gut, die Verwendung von Mobiltelefonen ist jedoch schwierig, da sie nicht über viel Speicher und Rechengeschwindigkeit verfügen. Als eine der Lösungen für diese Probleme scheint Google ein Netzwerk MobileNet [^ 1] erstellt zu haben, das einen Kompromiss zwischen Berechnungszeit, Speicher und Leistung eingehen kann. Deshalb habe ich es untersucht.

Was ist MobileNet?

Charakteristisch

Wie es funktioniert

image.png

image.png Zitiert aus MobileNets [^ 1]

Netzwerkstruktur

Struktur, wenn $ \ alpha = 0.5 $ in CIFAR10.

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 32, 32, 3)         0         
_________________________________________________________________
conv1 (Conv2D)               (None, 16, 16, 16)        432       
_________________________________________________________________
conv1_bn (BatchNormalization (None, 16, 16, 16)        64        
_________________________________________________________________
conv1_relu (Activation)      (None, 16, 16, 16)        0         
_________________________________________________________________
conv_dw_1 (DepthwiseConv2D)  (None, 16, 16, 16)        144       
_________________________________________________________________
conv_dw_1_bn (BatchNormaliza (None, 16, 16, 16)        64        
_________________________________________________________________
conv_dw_1_relu (Activation)  (None, 16, 16, 16)        0         
_________________________________________________________________
conv_pw_1 (Conv2D)           (None, 16, 16, 32)        512       
_________________________________________________________________
conv_pw_1_bn (BatchNormaliza (None, 16, 16, 32)        128       
_________________________________________________________________
conv_pw_1_relu (Activation)  (None, 16, 16, 32)        0         
_________________________________________________________________
conv_dw_2 (DepthwiseConv2D)  (None, 8, 8, 32)          288       
_________________________________________________________________
conv_dw_2_bn (BatchNormaliza (None, 8, 8, 32)          128       
_________________________________________________________________
conv_dw_2_relu (Activation)  (None, 8, 8, 32)          0         
_________________________________________________________________
conv_pw_2 (Conv2D)           (None, 8, 8, 64)          2048      
_________________________________________________________________
conv_pw_2_bn (BatchNormaliza (None, 8, 8, 64)          256       
_________________________________________________________________
conv_pw_2_relu (Activation)  (None, 8, 8, 64)          0         
_________________________________________________________________
...
_________________________________________________________________
conv_dw_13 (DepthwiseConv2D) (None, 1, 1, 512)         4608      
_________________________________________________________________
conv_dw_13_bn (BatchNormaliz (None, 1, 1, 512)         2048      
_________________________________________________________________
conv_dw_13_relu (Activation) (None, 1, 1, 512)         0         
_________________________________________________________________
conv_pw_13 (Conv2D)          (None, 1, 1, 512)         262144    
_________________________________________________________________
conv_pw_13_bn (BatchNormaliz (None, 1, 1, 512)         2048      
_________________________________________________________________
conv_pw_13_relu (Activation) (None, 1, 1, 512)         0         
_________________________________________________________________
global_average_pooling2d_1 ( (None, 512)               0         
_________________________________________________________________
reshape_1 (Reshape)          (None, 1, 1, 512)         0         
_________________________________________________________________
dropout (Dropout)            (None, 1, 1, 512)         0         
_________________________________________________________________
conv_preds (Conv2D)          (None, 1, 1, 10)          5130      
_________________________________________________________________
act_softmax (Activation)     (None, 1, 1, 10)          0         
_________________________________________________________________
reshape_2 (Reshape)          (None, 10)                0         
=================================================================
Total params: 834,666
Trainable params: 823,722
Non-trainable params: 10,944
_________________________________________________________________

Überprüfungsumgebung

Beispielcode

import keras
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.applications import MobileNet

batch_size = 32
classes = 10
epochs = 200

(X_train, y_train), (X_test, y_test) = cifar10.load_data()
Y_train = keras.utils.to_categorical(y_train, classes)
Y_test = keras.utils.to_categorical(y_test, classes)

img_input = keras.layers.Input(shape=(32, 32, 3))
model = MobileNet(input_tensor=img_input, alpha=0.5, weights=None, classes=classes)
model.compile(loss='categorical_crossentropy', optimizer="nadam", metrics=['accuracy'])
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255

datagen = ImageDataGenerator(
    featurewise_center=False,  # set input mean to 0 over the dataset
    samplewise_center=False,  # set each sample mean to 0
    featurewise_std_normalization=False,  # divide inputs by std of the dataset
    samplewise_std_normalization=False,  # divide each input by its std
    zca_whitening=False,  # apply ZCA whitening
    rotation_range=0,  # randomly rotate images in the range (degrees, 0 to 180)
    width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
    height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
    horizontal_flip=True,  # randomly flip images
    vertical_flip=False)  # randomly flip images
datagen.fit(X_train)
model.fit_generator(datagen.flow(X_train, Y_train, batch_size=batch_size),
                    steps_per_epoch=X_train.shape[0] // batch_size,
                    epochs=epochs,
                    validation_data=(X_test, Y_test))

Vorsichtsmaßnahmen

Experimentelle Ergebnisse mit ImageNet

----------------------------------------------------------------------------
Width Multiplier (alpha) | ImageNet Acc |  Multiply-Adds (M) |  Params (M)
----------------------------------------------------------------------------
|   1.0 MobileNet-224    |    70.6 %     |        529        |     4.2     |
|   0.75 MobileNet-224   |    68.4 %     |        325        |     2.6     |
|   0.50 MobileNet-224   |    63.7 %     |        149        |     1.3     |
|   0.25 MobileNet-224   |    50.6 %     |        41         |     0.5     |
----------------------------------------------------------------------------
------------------------------------------------------------------------
      Resolution      | ImageNet Acc | Multiply-Adds (M) | Params (M)
------------------------------------------------------------------------
|  1.0 MobileNet-224  |    70.6 %    |        529        |     4.2     |
|  1.0 MobileNet-192  |    69.1 %    |        529        |     4.2     |
|  1.0 MobileNet-160  |    67.2 %    |        529        |     4.2     |
|  1.0 MobileNet-128  |    64.4 %    |        529        |     4.2     |
------------------------------------------------------------------------

Zusammenfassung

――MobileNet ist ein Netzwerk, das einen Kompromiss zwischen Leistung und Berechnungsbetrag eingehen kann.

References

Recommended Posts

[Umfrage] MobileNets: Effiziente Faltungs-Neuronale Netze für Mobile Vision-Anwendungen
[Textklassifizierung] Ich habe versucht, Faltungsneurale Netze für die Satzklassifizierung mit Chainer zu implementieren
[Textklassifizierung] Ich habe versucht, den Aufmerksamkeitsmechanismus für Faltungs-Neuronale Netze zu verwenden.