The MobileNetV2 architecture is based on an inverted residual structure where the input and output of the residual block are thin bottleneck layers opposite to traditional residual models which use expanded representations in the input an MobileNetV2 uses lightweight depthwise convolutions to filter features in the intermediate expansion layer.
The architectural definition of each network refers to the following papers:
[1] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen. MobileNetV2: Inverted Residuals and Linear Bottlenecks. arXiv preprint arXiv: 1801.04381.
The following line chart shows the accuracy comparison of MobileNetV2 network in Top1 and Top5 of MindSpore and TensorFlow.
The following table lists all MobilenetV2 imagenet checkpoints. Each model verifies the accuracy of Top-1 and Top-5, and compares it with that of TensorFlow.
MindSpore | MindSpore | TensorFlow | TensorFlow | |||
---|---|---|---|---|---|---|
Model | Top-1 (%) | Top-5 (%) | Top-1 (%) | Top-5 (%) | Download | Config |
mobilenet_v2_1.4_224 | 75.43 | 92.5 | 75.0 | 92.5 | model | config |
mobilenet_v2_1.3_224 | 74.7 | 92.3 | 74.4 | 92.1 | model | config |
mobilenet_v2_1.0_224 | 72.4 | 90.8 | 71.8 | 91.0 | model | config |
mobilenet_v2_1.0_192 | 71.3 | 90.2 | 70.7 | 90.1 | model | config |
mobilenet_v2_1.0_160 | 69.7 | 89.1 | 68.8 | 89.0 | model | config |
mobilenet_v2_1.0_128 | 67.1 | 86.9 | 65.3 | 86.9 | model | config |
mobilenet_v2_1.0_96 | 61.9 | 83.6 | 60.3 | 83.2 | model | config |
mobilenet_v2_0.75_224 | 70.3 | 89.4 | 69.8 | 89.6 | model | config |
mobilenet_v2_0.75_192 | 69.2 | 88.7 | 68.7 | 88.9 | model | config |
mobilenet_v2_0.75_160 | 67.3 | 87.4 | 66.4 | 87.3 | model | config |
mobilenet_v2_0.75_128 | 64.5 | 85.4 | 63.2 | 85.3 | model | config |
mobilenet_v2_0.75_96 | 59.4 | 81.6 | 58.8 | 81.6 | model | config |
mobilenet_v2_0.5_224 | 66.1 | 86.3 | 65.4 | 86.4 | model | config |
mobilenet_v2_0.5_192 | 64.7 | 85.2 | 63.9 | 85.4 | model | config |
mobilenet_v2_0.5_160 | 61.9 | 83.4 | 61.0 | 83.2 | model | config |
mobilenet_v2_0.5_128 | 58.7 | 80.9 | 57.7 | 80.8 | model | config |
mobilenet_v2_0.5_96 | 52.9 | 76.3 | 51.2 | 75.8 | model | config |
mobilenet_v2_0.35_224 | 61.9 | 83.2 | 60.3 | 82.9 | model | config |
mobilenet_v2_0.35_192 | 59.5 | 81.6 | 58.2 | 81.2 | model | config |
mobilenet_v2_0.35_160 | 57.6 | 79.8 | 55.7 | 79.1 | model | config |
mobilenet_v2_0.35_128 | 52.8 | 76.3 | 50.8 | 75.0 | model | config |
mobilenet_v2_0.35_96 | 47.5 | 71.3 | 45.5 | 70.4 | model | config |
Parameter | Default | Description |
---|---|---|
device_target | GPU | Hardware device |
data_url | Path to training dataset | |
pretrained | False | Path to pretrained model |
run_distribute | True | Distributed parallel training |
num_parallel_workers | 8 | Number of parallel workers |
dataset_sink_mode | True | Data sinking mode |
num_classes | 1001 | Number of dataset classifications |
batch_size | 64 | Number of batch size |
repeat_num | 1 | Number of data repetitions |
momentum | 0.9 | Momentum parameter |
epoch_size | 200 | Number of epoch |
keep_checkpoint_max | 10 | Maximum number of checkpoints saved |
ckpt_save_dir | './mobilenet_v2' | Save path of checkpoint |
lr_decay_mode | cosine_decay_lr | Learning rate decay mode |
decay_epoch | 200 | Number of decay epoch |
smooth_factor | 0.1 | Label smoothing factor |
max_lr | 0.1 | maximum learning rate |
min_lr | 0.0 | minimum learning rate |
milestone | A list of milestone | |
learning_rates | A list of learning rates | |
alpha | 1.0 | Magnification factor |
resize | 224 | Resize the height and weight of picture |
-
The following configuration uses 8 GPUs for training. The magnification factor is set to 1.0 and the image input size is set to 224.
mpirun -n 8 python mobilenet_v2_imagenet_train.py --alpha 1.0 --resize 224 --data_url ./dataset/imagenet
output:
Epoch:[0/200], step:[2502/2502], loss:[4.676/4.676], time:872084.093, lr:0.10000 Epoch time:883614.453, per step time:353.163, avg loss:4.676 Epoch:[1/200], step:[2502/2502], loss:[4.452/4.452], time:693370.244, lr:0.09998 Epoch time:693374.709, per step time:277.128, avg loss:4.452 Epoch:[2/200], step:[2502/2502], loss:[3.885/3.885], time:685880.388, lr:0.09990 Epoch time:685884.401, per step time:274.134, avg loss:3.885 Epoch:[3/200], step:[2502/2502], loss:[3.550/3.550], time:689409.851, lr:0.09978 Epoch time:689413.237, per step time:275.545, avg loss:3.550 Epoch:[4/200], step:[2502/2502], loss:[3.371/3.371], time:692162.583, lr:0.09961 Epoch time:692166.163, per step time:276.645, avg loss:3.371 ...
-
The following configuration uses yaml file for training.
mpirun -n 8 python examples/classification/tools/train_config.py -c mindvision/classification/config/mobilenet_v2/mobilenet_v2_1.0_224_imagenet2012.yaml
output:
Epoch:[0/200], step:[2502/2502], loss:[5.350/5.350], time:860471.865, lr:0.10000 Epoch time:867121.671, per step time:346.571, avg loss:5.350 Epoch:[1/200], step:[2502/2502], loss:[4.737/4.737], time:838193.601, lr:0.09998 Epoch time:838236.939, per step time:335.027, avg loss:4.737 Epoch:[2/200], step:[2502/2502], loss:[4.430/4.430], time:841103.726, lr:0.09990 Epoch time:841167.664, per step time:336.198, avg loss:4.430 Epoch:[3/200], step:[2502/2502], loss:[3.845/3.845], time:844526.365, lr:0.09978 Epoch time:844609.499, per step time:337.574, avg loss:3.845 Epoch:[4/200], step:[2502/2502], loss:[3.604/3.604], time:840851.149, lr:0.09961 Epoch time:840912.141, per step time:336.096, avg loss:3.604 ...
-
The following configuration for eval. The magnification factor is set to 0.75 and the image input size is set to 192.
python mobilenet_v2_imagenet_eval.py --alpha 0.75 --resize 192 --pretrained True --data_url ./dataset/imagenet
output:
{'Top_1_Accuracy': 0.6922876602564103, 'Top_5_Accuracy': 0.8871594551282052}
-
The following configuration uses yaml file for eval.
python examples/classification/tools/eval_config.py -c mindvision/classification/config/mobilenet_v2/mobilenet_v2_0.75_192_imagenet2012.yaml
output:
{'Top_1_Accuracy': 0.692441581306018, 'Top_5_Accuracy': 0.8872439180537772}
-
The following configuration for infer. The magnification factor is set to 1.0 and the image input size is set to 224.
python mobilenet_v2_imagenet_infer.py --alpha 1.0 --resize 224 --pretrained True --data_url ./infer
output:
{283: 'Persian cat'}