DiracDeltaNet

PyTorch implementation of DiracDeltaNet from paper Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs by Yifan Yang. This uses the ShiftResNet codebase written by Alvin Wan and label-refinery by Hessam Bagherinezhad.

DiracDeltaNet is an efficient convolution neural network tailored for embedded FPGAs on ImageNet classification task. Its macro-architecture originates from ShuffleNet V2. DiracDeltaNet is codesigned with its embedded FPGA accelerator. It has the following features:

The operator set in DiracDeltaNet is shrunk to 1x1 convolution, 2x2 max pooling, shift, channel shuffle and concatenation for hardware efficiency
All of the 3x3 convolutions in ShuffleNet V2 are replaced with shift operations and 1x1 convolutions
Several 2x2 max-pooling layers are added and the kernel size of the existing 3x3 max-pooling are reduced to 2x2
Transpose based channel shuffle is changed into shift-based channel shuffle
It can be aggressively quantized into 4-bit weights and 4-bit activations with less than 1% top-5 accuracy loss

In this repository, we offer:

Our ShuffleNet V2 implementation
Source code of DiracDeltaNet
Pre-trained ShuffleNetv2 and DiracDeltaNet
Training and testing code

Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs

By Yifan Yang, Qijing Huang, Bichen Wu, Tianjun Zhang, Liang Ma, Giulio Gambardella, Michaela Blott, Luciano Lavagno, Kees Vissers, John Wawrzynek and Kurt Keutzer

The ideas behind the design of DiracDeltaNet, details about its embedded FPGA accelerator and more experimental results can be found in the paper (link).

If you find this work useful for your research, please consider citing:

@article{synetgy,
author    = {Yifan Yang and
            Qijing Huang and
            Bichen Wu and
            Tianjun Zhang and
            Liang Ma and
            Giulio Gambardella and
            Michaela Blott and
            Luciano Lavagno and
            Kees A. Vissers and
            John Wawrzynek and
            Kurt Keutzer},
title     = {Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on
            Embedded FPGAs},
journal   = {CoRR},
volume    = {abs/1811.08634},
year      = {2018},
url       = {http://arxiv.org/abs/1811.08634},
archivePrefix = {arXiv},
eprint    = {1811.08634},
}

Download

The training of DiracDeltaNet adopts a pre-trained ResNet50 (download) as label-refinery.

We offer the following pre-trained model:

Our implementation of ShuffleNet V2 1x with 90 epoch of training
Full precision DiracDeltaNet with 90 epoch of training
Quantized DiracDeltaNet (4-bit weights, 4-bit activations)

The pre-trained models can be found on Dropbox.

Please put the ResNet50 model and pre-trained model in the following file structure:

DiracDeltaNet/
   |
   |-- test.py
   |-- resnet50.t7
   |-- checkpoint/
       |-- ShuffleNetv2.t7
       |-- DiracDeltaNet_full.t7
       |-- ...

Usage

The source code requires PyTorch 0.4.0 (there is known incompatible issue when using PyTorch 0.4.1, haven't tested on PyTorch 1.0). Python 3.5+ is needed (there is known incompatible issue when using Python 2.7).

The full list of arguments can be accessed using --help

Inference

For example, to run inference of our ShuffleNet V2 1x implementation, simply type:

python test.py --datadir=PATH-TO-IMAGENET-FOLDER --inputdir=./checkpoint/ShuffleNetv2.t7

Training

For example, to train full precision DiracDeltaNet from scratch, simply type:

python train.py --datadir=PATH-TO-IMAGENET-FOLDER --outputdir=./checkpoint/DiracDeltaNet_full.t7

The default values of arguments are the hyperparameter we used.

Fine Tuning

For example, to fine tune 8-bit weights and 8-bit activations (except for the first and last conv) DiracDeltaNet from full precision pre-trained DiracDeltaNet, simply type:

python train.py --datadir=PATH-TO-IMAGENET-FOLDER --inputdir=./checkpoint/DiracDeltaNet_full.t7 --outputdir=./checkpoint/DiracDeltaNet_w8a8.t7 --lr_policy=step --weight_bit=8 --act_bit=8

You can set smaller lr as well as # of epochs.

Experimental Results

Model	Weight Bitwidth	Activation Bitwidth	Top-1 Acc	Top-5 Acc	Note
ShuffleNet V2 1x	32	32	69.4%	N/A	original paper
ShuffleNet V2 1x	32	32	67.9%	88.0%	our implementation with 90 epoch training
DiracDeltaNet	32	32	68.9%	88.7%	90 epoch training
DiracDeltaNet	4	4	68.3%	88.1%

More can be found in the paper.

lloo099/DiracDeltaNet