/HBONet

[ICCV 2019] Harmonious Bottleneck on Two Orthogonal Dimensions, surpassing MobileNetV2

Primary LanguagePythonApache License 2.0Apache-2.0

HBONet

Official implementation of our HBONet architecture as described in HBONet: Harmonious Bottleneck on Two Orthogonal Dimensions (ICCV'19) by Duo Li, Aojun Zhou and Anbang Yao on ILSVRC2012 benchmark with PyTorch framework.

We integrate our HBO modules into the state-of-the-art MobileNetV2 backbone as a reference case. Baseline models of MobileNetV2 counterparts are available in my repository mobilenetv2.pytorch.

Requirements

Dependencies

  • PyTorch 1.0+
  • NVIDIA-DALI (in development, not recommended)

Dataset

Download the ImageNet dataset and move validation images to labeled subfolders. To do this, you can use the following script: https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh

Pretrained models

The following statistics are reported on the ILSVRC2012 validation set with single center crop testing.

HBONet with a spectrum of width multipliers (Table 2)

Architecture MFLOPs Top-1 / Top-5 Acc. (%)
HBONet 1.0 305 73.1 / 91.0
HBONet 0.8 205 71.3 / 89.7
HBONet 0.5 96 67.0 / 86.9
HBONet 0.35 61 62.4 / 83.7
HBONet 0.25 37 57.3 / 79.8
HBONet 0.1 14 41.5 / 65.7

HBONet 0.8 with a spectrum of input resolutions (Table 3)

Architecture MFLOPs Top-1 / Top-5 Acc. (%)
HBONet 0.8 224x224 205 71.3 / 89.7
HBONet 0.8 192x192 150 70.0 / 89.2
HBONet 0.8 160x160 105 68.3 / 87.8
HBONet 0.8 128x128 68 65.5 / 85.9
HBONet 0.8 96x96 39 61.4 / 83.0

HBONet 0.35 with a spectrum of input resolutions (Table 4)

Architecture MFLOPs Top-1 / Top-5 Acc. (%)
HBONet 0.35 224x224 61 62.4 / 83.7
HBONet 0.35 192x192 45 60.9 / 82.6
HBONet 0.35 160x160 31 58.6 / 80.7
HBONet 0.35 128x128 21 55.2 / 78.0
HBONet 0.35 96x96 12 50.3 / 73.8

HBONet with different width multipliers and different input resolutions (Table 5)

Architecture MFLOPs Top-1 / Top-5 Acc. (%)
HBONet 0.5 224x224 98 67.7 / 87.4
HBONet 0.6 192x192 108 67.3 / 87.3

HBONet 0.25 variants with different down-sampling and up-sampling rates (Table 6)

Architecture MFLOPs Top-1 / Top-5 Acc. (%)
HBONet(2x) 0.25 44 58.3 / 80.6
HBONet(4x) 0.25 45 59.3 / 81.4
HBONet(8x) 0.25 45 58.2 / 80.4

Taking HBONet 1.0 as an example, pretrained models can be easily imported using the following lines and then finetuned for other vision tasks or utilized in resource-aware platforms. (To create variant models in Table 5 & 6, it is necessary to make slight modifications following the instructions in the docstrings of the model file in advance.)

from models.imagenet import hbonet

net = hbonet()
net.load_state_dict(torch.load('pretrained/hbonet_1_0.pth'))

Usage

Training

Configuration to reproduce our reported results, totally the same as mobilenetv2.pytorch for fair comparison.

  • batch size 256
  • epoch 150
  • learning rate 0.05
  • LR decay strategy cosine
  • weight decay 0.00004
python imagenet.py \
    -a hbonet \
    -d <path-to-ILSVRC2012-data> \
    --epochs 150 \
    --lr-decay cos \
    --lr 0.05 \
    --wd 4e-5 \
    -c <path-to-save-checkpoints> \
    --width-mult <width-multiplier> \
    --input-size <input-resolution> \
    -j <num-workers>

Test

python imagenet.py \
    -a hbonet \
    -d <path-to-ILSVRC2012-data> \
    --weight <pretrained-pth-file> \
    --width-mult <width-multiplier> \
    --input-size <input-resolution> \
    -e

Citations

If you find our work useful in your research, please consider citing:

@InProceedings{Li_2019_ICCV,
author = {Li, Duo and Zhou, Aojun and Yao, Anbang},
title = {HBONet: Harmonious Bottleneck on Two Orthogonal Dimensions},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2019}
}