Wenn Sie GoogleNet oder VGG16 verwenden, ist die Leistung der Objekterkennung zwar gut, die Verwendung von Mobiltelefonen ist jedoch schwierig, da sie nicht über viel Speicher und Rechengeschwindigkeit verfügen. Als eine der Lösungen für diese Probleme scheint Google ein Netzwerk MobileNet [^ 1] erstellt zu haben, das einen Kompromiss zwischen Berechnungszeit, Speicher und Leistung eingehen kann. Deshalb habe ich es untersucht.

Was ist MobileNet?


Wie es funktioniert


Struktur, wenn $ \ alpha = 0.5 $ in CIFAR10.

Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 32, 32, 3)         0         
conv1 (Conv2D)               (None, 16, 16, 16)        432       
conv1_bn (BatchNormalization (None, 16, 16, 16)        64        
conv1_relu (Activation)      (None, 16, 16, 16)        0         
conv_dw_1 (DepthwiseConv2D)  (None, 16, 16, 16)        144       
conv_dw_1_bn (BatchNormaliza (None, 16, 16, 16)        64        
conv_dw_1_relu (Activation)  (None, 16, 16, 16)        0         
conv_pw_1 (Conv2D)           (None, 16, 16, 32)        512       
conv_pw_1_bn (BatchNormaliza (None, 16, 16, 32)        128       
conv_pw_1_relu (Activation)  (None, 16, 16, 32)        0         
conv_dw_2 (DepthwiseConv2D)  (None, 8, 8, 32)          288       
conv_dw_2_bn (BatchNormaliza (None, 8, 8, 32)          128       
conv_dw_2_relu (Activation)  (None, 8, 8, 32)          0         
conv_pw_2 (Conv2D)           (None, 8, 8, 64)          2048      
conv_pw_2_bn (BatchNormaliza (None, 8, 8, 64)          256       
conv_pw_2_relu (Activation)  (None, 8, 8, 64)          0         
conv_dw_13 (DepthwiseConv2D) (None, 1, 1, 512)         4608      
conv_dw_13_bn (BatchNormaliz (None, 1, 1, 512)         2048      
conv_dw_13_relu (Activation) (None, 1, 1, 512)         0         
conv_pw_13 (Conv2D)          (None, 1, 1, 512)         262144    
conv_pw_13_bn (BatchNormaliz (None, 1, 1, 512)         2048      
conv_pw_13_relu (Activation) (None, 1, 1, 512)         0         
global_average_pooling2d_1 ( (None, 512)               0         
reshape_1 (Reshape)          (None, 1, 1, 512)         0         
dropout (Dropout)            (None, 1, 1, 512)         0         
conv_preds (Conv2D)          (None, 1, 1, 10)          5130      
act_softmax (Activation)     (None, 1, 1, 10)          0         
reshape_2 (Reshape)          (None, 10)                0         
Total params: 834,666
Trainable params: 823,722
Non-trainable params: 10,944



import keras
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.applications import MobileNet

batch_size = 32
classes = 10
epochs = 200

(X_train, y_train), (X_test, y_test) = cifar10.load_data()
Y_train = keras.utils.to_categorical(y_train, classes)
Y_test = keras.utils.to_categorical(y_test, classes)

img_input = keras.layers.Input(shape=(32, 32, 3))
model = MobileNet(input_tensor=img_input, alpha=0.5, weights=None, classes=classes)
model.compile(loss='categorical_crossentropy', optimizer="nadam", metrics=['accuracy'])
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255

datagen = ImageDataGenerator(
    featurewise_center=False,  # set input mean to 0 over the dataset
    samplewise_center=False,  # set each sample mean to 0
    featurewise_std_normalization=False,  # divide inputs by std of the dataset
    samplewise_std_normalization=False,  # divide each input by its std
    zca_whitening=False,  # apply ZCA whitening
    rotation_range=0,  # randomly rotate images in the range (degrees, 0 to 180)
    width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
    height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
    horizontal_flip=True,  # randomly flip images
    vertical_flip=False)  # randomly flip images
model.fit_generator(datagen.flow(X_train, Y_train, batch_size=batch_size),
                    steps_per_epoch=X_train.shape[0] // batch_size,
                    validation_data=(X_test, Y_test))


Experimentelle Ergebnisse mit ImageNet

Width Multiplier (alpha) | ImageNet Acc |  Multiply-Adds (M) |  Params (M)
|   1.0 MobileNet-224    |    70.6 %     |        529        |     4.2     |
|   0.75 MobileNet-224   |    68.4 %     |        325        |     2.6     |
|   0.50 MobileNet-224   |    63.7 %     |        149        |     1.3     |
|   0.25 MobileNet-224   |    50.6 %     |        41         |     0.5     |
      Resolution      | ImageNet Acc | Multiply-Adds (M) | Params (M)
|  1.0 MobileNet-224  |    70.6 %    |        529        |     4.2     |
|  1.0 MobileNet-192  |    69.1 %    |        529        |     4.2     |
|  1.0 MobileNet-160  |    67.2 %    |        529        |     4.2     |
|  1.0 MobileNet-128  |    64.4 %    |        529        |     4.2     |


――MobileNet ist ein Netzwerk, das einen Kompromiss zwischen Leistung und Berechnungsbetrag eingehen kann.


