/deeplabv3p_gluon

DeepLab v3+ in MXNet Gluon

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

deeplabv3p_gluon

DeepLab v3+ in MXNet Gluon

Note

  • train_multi_gpu.py: multi-gpu training on Pascal VOC dataset, with validation.
  • train.py: single-gpu training on Pascal VOC dataset, with validation.
  • evaluate.py: single-gpu evaluation on Pascal VOC validation.
  • extract_weights.py: convert the weights from official model release.
  • mylib: lib-style clean code.
  • workspace: the notebooks where I did experiments, with messy staffs (ignore them).
  • GPU version only, but it should be modified easily into a CPU version.
  • My running environments, not tested with other environments:
    • Python==3.6
    • MXNet>=1.2.0 (MXNet==1.3.0 for multi-gpu SyncBatchNorm)
    • gluoncv==0.3.0
    • TensorFlow==1.4.0, Keras==2.1.5 (for converting the weights)
  • Download the dataset
git clone https://github.com/dmlc/gluon-cv
cd gluon-cv/scripts/datasets
python pascal_voc.py

Models

My porting on Pascal VOC validation:

Model EvalOS (w/ or w/o inference tricks) mIoU (%)
xception_coco_voc_trainaug (TF release) 16 (w/o)
8 (w/)
82.20
83.58
xception_coco_voc_trainaug (MXNet porting) 16 (w/o)
8 (w/o)
79.19
81.82
xception_coco_voc_trainaug (MXNet finetune TrainOS=16) 16 (w/o)
8 (w/o)
82.75
82.56
xception_coco_voc_trainaug (MXNet finetune TrainOS=8) 16 (w/o)
8 (w/o)
82.02
83.14
xception_voc_trainaug
ImageNet pretrained only, without MSCOCO pretrained
16 (w/o)
8 (w/o)
77.06
76.44

AWS Runtime & Cost

Measured with fixing batch stats (use_global_stats=True), just for reference.

Instance GPUs Pricing Train OS Speed Train on train_aug Eval on val Time per epoch Cost per epoch
p2.8xlarge K80x8 7.20$/h 16
8
1.5s/b16
3.4s/b16
17.0min
37.5min
3.5min
10min
(BUGS: gpus do not use sufficiently during eval)
20.5min
47.5min
$2.5
$5.7
p3.8xlarge V100x4 12.24$/h 16
8
0.5s/b16
3.0s/b12
5.5min
44.5min
0.7min
1.3min
6.2min
45.8min
$1.3
$9.3

Memo

  • transfer all the weights
  • add OS=8
  • test iou on PASCAL val
  • add training scripts
  • add multi-gpu training scripts
  • train more and open source the best models
  • VOCAug dataset pull request
  • Model pull request
  • Finish pull request to gluoncv

Acknowledge

This repository is a part of MXNet summer code hosted by AWS, TuSimple and Jiangmen. Specifically, I would like to thank Hang Zhang (@AWS) and Hengchen Dai (@TuSimple) for kind suggestions on tuning and implementation. Plus, I would like to thank AWS for providing generous credits for tuning the computationally intensive models.