PyTorch implementation of DiracDeltaNet from paper Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs by Yifan Yang
. This uses the ShiftResNet codebase written by Alvin Wan and label-refinery by Hessam Bagherinezhad.
DiracDeltaNet is an efficient convolution neural network tailored for embedded FPGAs on ImageNet classification task. Its macro-architecture originates from ShuffleNet V2. DiracDeltaNet is codesigned with its embedded FPGA accelerator. It has the following features:
- The operator set in DiracDeltaNet is shrunk to 1x1 convolution, 2x2 max pooling, shift, channel shuffle and concatenation for hardware efficiency
- All of the 3×3 convolutions in ShuffleNet V2 are replaced with shift operations and 1×1 convolutions
- Several 2x2 max-pooling layers are added and the kernel size of the existing 3x3 max-pooling are reduced to 2x2
- Transpose based channel shuffle is changed into shift-based channel shuffle
- It can be aggressively quantized into 1-bit weights and 4-bit activations with less than 1% top-5 accuracy loss
In this repository, we offer:
- Our ShuffleNet V2 implementation
- Source code of DiracDeltaNet
- Pre-trained ShuffleNetv2 and DiracDeltaNet
- Training and testing code
By Yifan Yang, Qijing Huang, Bichen Wu, Tianjun Zhang, Liang Ma, Giulio Gambardella, Michaela Blott, Luciano Lavagno, Kees Vissers, John Wawrzynek and Kurt Keutzer
The ideas behind the design of DiracDeltaNet, details about its embedded FPGA accelerator and more experimental results can be found in the paper (link).
If you find this work useful for your research, please consider citing:
@article{synetgy,
author = {Yifan Yang and
Qijing Huang and
Bichen Wu and
Tianjun Zhang and
Liang Ma and
Giulio Gambardella and
Michaela Blott and
Luciano Lavagno and
Kees A. Vissers and
John Wawrzynek and
Kurt Keutzer},
title = {Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on
Embedded FPGAs},
journal = {CoRR},
volume = {abs/1811.08634},
year = {2018},
url = {http://arxiv.org/abs/1811.08634},
archivePrefix = {arXiv},
eprint = {1811.08634},
}
The training of DiracDeltaNet adopts a pre-trained ResNet50 (download) as label-refinery.
We offer the following pre-trained model:
- Our implementation of ShuffleNet V2 1x with 90 epoch of training
- Full precision DiracDeltaNet
- Quantized DiracDeltaNet (1-bit weights, 4-bit activations, 8-bit fc weights)
The pre-trained models can be found on Google Drive.
Please put the ResNet50 model and pre-trained model in the following file structure:
DiracDeltaNet/
|
|-- test.py
|-- resnet50.t7
|-- checkpoint/
|-- ShuffleNetv2.t7
|-- DiracDeltaNet_full.t7
|-- ...
The source code requires PyTorch 0.4.0 (there is known incompatible issue when using PyTorch 0.4.1, haven't tested on PyTorch 1.0). Python 3.5+ is needed (there is known incompatible issue when using Python 2.7).
The full list of arguments can be accessed using --help
For example, to run inference of our ShuffleNet V2 1x implementation, simply type:
python test.py --datadir=PATH-TO-IMAGENET-FOLDER --inputdir=./checkpoint/ShuffleNetv2.t7
For example, to train full precision DiracDeltaNet from scratch, simply type:
python train.py --datadir=PATH-TO-IMAGENET-FOLDER --outputdir=./checkpoint/DiracDeltaNet_full.t7
The default values of arguments are the hyperparameter we used.
For example, to fine tune 8-bit weights and 8-bit activations (except for the first and last conv) DiracDeltaNet from full precision pre-trained DiracDeltaNet, simply type:
python train.py --datadir=PATH-TO-IMAGENET-FOLDER --inputdir=./checkpoint/DiracDeltaNet_full.t7 --outputdir=./checkpoint/DiracDeltaNet_w8a8.t7 --lr_policy=step --weight_bit=8 --act_bit=8
You can set smaller lr as well as # of epochs.
Model | Weight Bitwidth | Activation Bitwidth | Top-1 Acc | Top-5 Acc | Note |
---|---|---|---|---|---|
ShuffleNet V2 1x | 32 | 32 | 69.4% | N/A | original paper |
ShuffleNet V2 1x | 32 | 32 | 67.9% | 88.0% | our implementation with 90 epoch training |
DiracDeltaNet | 32 | 32 | 69.7% | 89.0% | |
DiracDeltaNet | 16 | 16 | 70.1% | 89.2% | |
DiracDeltaNet | 8 | 8 | 70.3% | 89.3% | |
DiracDeltaNet | 4 | 4 | 68.3% | 88.1% | |
DiracDeltaNet | 2 | 4 | 68.5% | 88.1% | |
DiracDeltaNet | 1 | 4 | 68.5% | 88.2% | 8-bit fc weights |
More can be found in the paper.