A PyTorch implementation for PyramidNets (Deep Pyramidal Residual Networks)

This repository contains a PyTorch implementation for the paper: Deep Pyramidal Residual Networks (CVPR 2017, Dongyoon Han*, Jiwhan Kim*, and Junmo Kim, (equally contributed by the authors*)). The code in this repository is based on the example provided in PyTorch examples and the nice implementation of Densely Connected Convolutional Networks.

Two other implementations with LuaTorch and Caffe are provided:

A LuaTorch implementation for PyramidNets,
A Caffe implementation for PyramidNets.

Usage examples

To train additive PyramidNet-110 (alpha=48 without bottleneck) on CIFAR-10 dataset with single-GPU:

CUDA_VISIBLE_DEVICES=0 python train.py --alpha 64 --depth 110 --no-bottleneck --batchsize 32 --lr 0.025 --print-freq 1 --expname PyramidNet-110 --dataset cifar10

To train additive PyramidNet-164 (alpha=48 with bottleneck) on CIFAR-100 dataset with 4 GPU:

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --alpha 48 --depth 164 --batchsize 128 --lr 0.5 --print-freq 1 --expname PyramidNet-164 --dataset cifar100

Notes

This implementation is for CIFAR-10 and CIFAR-100 datasets with add-PyramidNet architecture, and the code will be updated for Imagenet-1k dataset soon.
The traditional data augmentation for CIFAR datasets are used by following fb.resnet.torch.
To use multi-GPU, data parallelism in PyTorch should be applied [i.e., model = torch.nn.DataParallel(model).cuda()].
An example code for ResNet is also included (an example code for pre-ResNet will be uploded soon).

Tracking training progress with TensorBoard

Thanks to the implementation, which support the TensorBoard to track training progress efficiently, all the experiments can be tracked with tensorboard_logger.

Tensorboard_logger can be installed with

pip install tensorboard_logger

Paper Preview

Abstract

Deep convolutional neural networks (DCNNs) have shown remarkable performance in image classification tasks in recent years. Generally, deep neural network architectures are stacks consisting of a large number of convolution layers, and they perform downsampling along the spatial dimension via pooling to reduce memory usage. At the same time, the feature map dimension (i.e., the number of channels) is sharply increased at downsampling locations, which is essential to ensure effective performance because it increases the capability of high-level attributes. Moreover, this also applies to residual networks and is very closely related to their performance. In this research, instead of using downsampling to achieve a sharp increase at each residual unit, we gradually increase the feature map dimension at all the units to involve as many locations as possible. This is discussed in depth together with our new insights as it has proven to be an effective design to improve the generalization ability. Furthermore, we propose a novel residual unit capable of further improving the classification accuracy with our new network architecture. Experiments on benchmark CIFAR datasets have shown that our network architecture has a superior generalization ability compared to the original residual networks.

Network architecture details

Schematic illustration of comparision of several units: (a) basic residual units, (b) bottleneck, (c) wide residual units, and (d) our pyramidal residual units, and (e) our pyramidal bottleneck residual units:

Visual illustration of (a) additive PyramidNet (the feature map dimension of each unit increases linearly), (b) multiplicative PyramidNet (the feature map dimension of each unit increases geometrically), and (c) comparison of (a) and (b):

Results

The results are readily reproduced, which show the same performances as those reproduced with A LuaTorch implementation for PyramidNets.
Comparison of the state-of-the-art networks by [Top-1 Test Error Rates VS # of Parameters]:

Top-1 test error rates (%) on CIFAR datasets are shown in the following table. All the results of PyramidNets are produced with additive PyramidNets, and α denotes alpha (the widening factor). “Output Feat. Dim.” denotes the feature dimension of just before the last softmax classifier.

Updates

Some minor bugs are fixed (2018/2/22)

Citation

Please cite our paper if PyramidNets are used:

@article{DPRN,
  title={Deep pyramidal residual networks},
  author={Han, Dongyoon and Kim, Jiwhan and Kim, Junmo},
  journal={arXiv preprint arXiv:1610.02915},
  year={2016}
}

If this implementation is useful, please also cite or acknowledge this repository on your work.

Contact

Dongyoon Han (dyhan@kaist.ac.kr), Jiwhan Kim (jhkim89@kaist.ac.kr), Junmo Kim (junmo.kim@kaist.ac.kr)

arunpatala/PyramidNet-PyTorch