This repository contains the code and trained models of:
Yunpeng Chen, Jianan Li, Huaxin Xiao, Xiaojie Jin, Shuicheng Yan, Jiashi Feng. "Dual Path Networks" (NIPS17).
- DPNs helped us won the 1st place in Object Localization Task in ILSVRC 2017, with all competition tasks within Top 3. (Team: NUS-Qihoo_DPNs)
DPNs are implemented by MXNet @92053bd.
Method | Settings |
---|---|
Random Mirror | True |
Random Crop | 8% - 100% |
Aspect Ratio | 3/4 - 4/3 |
Random HSL | [20,40,50] |
Note: We did not use PCA Lighting and any other advanced augmentation methods. Input images are resized by bicubic interpolation.
The augmented input images are substrated by mean RGB = [ 124, 117, 104 ], and then multiplied by 0.0167.
Here, we introduce a new testing technique by using Mean-Max Pooling which can further improve the performance of a well trained CNN in the testing phase without the need of any training/fine-tuining process. This testing technique is designed for the case when the testing images is larger than training crops. The idea is to first convert a trained CNN model into a convolutional network and then insert the following Mean-Max Pooling layer (a.k.a. Max-Avg Pooling), i.e. 0.5 * (global average pooling + global max pooling), just before the final softmax layer.
Based on our observations, Mean-Max Pooling consistently boost the testing accuracy. We adopted this testing strategy in both LSVRC16 and LSVRC17.
Single Model, Single Crop Validation Error:
Model | Size | GFLOPs | 224x224 | 320x320 | 320x320 ( with mean-max pooling ) |
|||
---|---|---|---|---|---|---|---|---|
Top 1 | Top 5 | Top 1 | Top 5 | Top 1 | Top 5 | |||
DPN-68 | 49 MB | 2.5 | 23.57 | 6.93 | 22.15 | 5.90 | 21.51 | 5.52 |
DPN-92 | 145 MB | 6.5 | 20.73 | 5.37 | 19.34 | 4.66 | 19.04 | 4.53 |
DPN-98 | 236 MB | 11.7 | 20.15 | 5.15 | 18.94 | 4.44 | 18.72 | 4.40 |
DPN-131 | 304 MB | 16.0 | 19.93 | 5.12 | 18.62 | 4.23 | 18.55 | 4.16 |
Single Model, Single Crop Validation Error:
Model | Size | GFLOPs | 224x224 | 320x320 | 320x320 ( with mean-max pooling ) |
|||
---|---|---|---|---|---|---|---|---|
Top 1 | Top 5 | Top 1 | Top 5 | Top 1 | Top 5 | |||
DPN-68 | 49 MB | 2.5 | 22.45 | 6.09 | 20.92 | 5.26 | 20.62 | 5.07 |
DPN-92 | 145 MB | 6.5 | 19.98 | 5.06 | 19.00 | 4.37 | 18.79 | 4.19 |
DPN-107 | 333 MB | 18.3 | 19.75 | 4.94 | 18.34 | 4.19 | 18.15 | 4.03 |
Note: DPN-107 is not well trained.
Single Model, Single Crop Validation Accuracy:
Model | Size | GFLOPs | 224x224 | 320x320 | 320x320 ( with mean-max pooling ) |
|||
---|---|---|---|---|---|---|---|---|
Top 1 | Top 5 | Top 1 | Top 5 | Top 1 | Top 5 | |||
DPN-68 | 61 MB | 2.5 | 61.27 | 85.46 | 61.54 | 85.99 | 62.35 | 86.20 |
DPN-92 | 184 MB | 6.5 | 67.31 | 89.49 | 66.84 | 89.38 | 67.42 | 89.76 |
Note: The higher model complexity comes from the final classifier. Models trained on ImageNet-5k learn much richer feature representation than models trained on ImageNet-1k.
The training speed is tested based on MXNet @92053bd.
Multiple Nodes (Without specific code optimization):
Model | CUDA /cuDNN |
#Node | GPU Card (per node) |
Batch Size (per GPU) |
kvstore |
GPU Mem (per GPU) |
Training Speed* (per node) |
---|---|---|---|---|---|---|---|
DPN-68 | 8.0 / 5.1 | 10 | 4 x K80 (Tesla) | 64 | dist_sync |
9337 MiB | 284 img/sec |
DPN-92 | 8.0 / 5.1 | 10 | 4 x K80 (Tesla) | 32 | dist_sync |
8017 MiB | 133 img/sec |
DPN-98 | 8.0 / 5.1 | 10 | 4 x K80 (Tesla) | 32 | dist_sync |
11128 MiB | 85 img/sec |
DPN-131 | 8.0 / 5.1 | 10 | 4 x K80 (Tesla) | 24 | dist_sync |
11448 MiB | 60 img/sec |
DPN-107 | 8.0 / 5.1 | 10 | 4 x K80 (Tesla) | 24 | dist_sync |
12086 MiB | 55 img/sec |
*This is the actual training speed, which includes
data augmentation
,forward
,backward
,parameter update
,network communication
, etc. MXNet is awesome, we observed a linear speedup as has been shown in link
Model | Size | Dataset | MXNet Model |
---|---|---|---|
DPN-68 | 49 MB | ImageNet-1k | GoogleDrive |
DPN-68* | 49 MB | ImageNet-1k | GoogleDrive |
DPN-68 | 61 MB | ImageNet-5k | GoogleDrive |
DPN-92 | 145 MB | ImageNet-1k | GoogleDrive |
DPN-92 | 138 MB | Places365-Standard | GoogleDrive |
DPN-92* | 145 MB | ImageNet-1k | GoogleDrive |
DPN-92 | 184 MB | ImageNet-5k | GoogleDrive |
DPN-98 | 236 MB | ImageNet-1k | GoogleDrive |
DPN-131 | 304 MB | ImageNet-1k | GoogleDrive |
DPN-107* | 333 MB | ImageNet-1k | GoogleDrive |
*Pretrained on ImageNet-5k and then fine-tuned on ImageNet-1k.
- Caffe Implementation with trained models by soeaver
- Chainer Implementation by oyam
- Keras Implementation by titu1994
- MXNet Implementation by miraclewkf
- PyTorch Implementation by oyam
- PyTorch Implementation with trained models by rwightman
ImageNet-1k Trainig/Validation List:
- Download link: GoogleDrive
ImageNet-1k category name mapping table:
- Download link: GoogleDrive
ImageNet-5k Raw Images:
- The ImageNet-5k is a subset of ImageNet10K provided by this paper.
- Please download the ImageNet10K and then extract the ImageNet-5k by the list below.
ImageNet-5k Trainig/Validation List:
- It contains about 5k leaf categories from ImageNet10K. There is no category overlapping between our provided ImageNet-5k and the official ImageNet-1k.
- Download link: GoogleDrive
Places365-Standard Validation List & Matlab code for 10 crops testing:
- Download link: GoogleDrive
If you use DPN in your research, please cite the paper:
@article{Chen2017,
title={Dual Path Networks},
author={Yunpeng Chen, Jianan Li, Huaxin Xiao, Xiaojie Jin, Shuicheng Yan, Jiashi Feng},
journal={arXiv preprint arXiv:1707.01629},
year={2017}
}