/BlockConv

[TCAD 2021] Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA

Primary LanguagePythonMIT LicenseMIT

BlockConv

This repository serves as the official code release of the paper Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA (pubilished at TCAD 2021).

Block convolution is a hardware-friendly, simple, yet efficient convolution operation that can completely avoid the off-chip transfer of intermediate feature maps at runtime. The fundamental idea of block convolution is to eliminate the dependency of feature map tiles in the spatial dimension when spatial tiling is used, which is realized by splitting a feature map into independent blocks so that convolution can be performed separately on individual blocks.

Installation

  • Python version >= 3.8
  • Pytorch version >= 1.8
# create conda environment
conda create -n BlockConv 
conda activate BlockConv
conda install pytorch torchvision cudatoolkit=10.2
pip install torchnet tqdm tabulate gitpython tensorboard

# install from source code
git clone https://github.com/zejiangp/BlockConv.git
cd BlockConv

Training from scratch

  • --arch: block_resnet18, block_resnet50, block_vgg16, block_mobilenet
  • --padding_mode: constant (equal to zero padding), replicate, reflect
  • --type: 0 (Fixed blocking), 1 (hierarchical blocking)

For example, if we want to train a resnet18 with block size 28, fixed blocking mode, and zero padding from scratch, the command as below:

python classification.py \
    ./data/ilsvrc12   \
    --dataset=imagenet   \
    --out_dir=logs/  \
    --gpus=0,1,2,3   \
    --arch=block_resnet18\
    --name=resnet18_F28_constant_scratch   \
    --batch_size=128   \
    -j=32   \
    --epochs=90 \
    --lr=0.1   \
    --wd=1e-4  \
    --momentum=0.9 \
    --milestones=30,60 \
    --block_size 28,28 \
    --padding_mode constant \
    --type 0 \
    --do_train \
    --do_eval

Fine-tuning

Another way to get a model using block convolution is fine-tuning from the pre-trained model:

python classification.py \
    ./data/ilsvrc12   \
    --dataset=imagenet   \
    --out_dir=logs/  \
    --gpus=0,1,2,3   \
    --arch=block_resnet18\
    --name=resnet18_F28_constant   \
    --batch_size=128   \
    -j=32   \
    --epochs=30 \
    --lr=0.001   \
    --wd=1e-4  \
    --momentum=0.9 \
    --milestones=10,20 \
    --block_size 28,28 \
    --padding_mode constant \
    --type 0 \
    --resume_from logs/resnet18_baseline.pth.tar \
    --reset_optimizer \
    --do_train \
    --do_eval

Hyperparamter

strategy model epochs batch size learning rate weight decay milestones
Training from scratch resnet18
resnet 50
vgg16
mobilenet
90
90
105
300
128
128
256
128
0.1
0.1
0.01
0.0001
1e-4
1e-4
5e-4
5e-5
30,60
30, 60
30, 60, 90
-
Fine-tuning resnet18
resnet 50
vgg16
mobilenet
30
30
20
50
128
128
256
128
0.001
0.001
0.001
0.0001
1e-4
1e-4
1e-4
5e-5
10, 20
10, 20
8, 16
-

Evaluation

python classification.py \
    ./data/ilsvrc12   \
    --dataset=imagenet   \
    --out_dir=test_logs/  \
    --gpus=0   \
    --arch=block_vgg16    \
    --name=test_vgg   \
    --batch_size=128   \
    -j=32   \
    --block_size 28,28 \
    --padding_mode constant \
    --type 0 \
    --do_eval \
    --resume_from logs/vgg16_finetune_F28_zero.pth.tar

Model Accuracy

We provide pre-trained models for evaluations here.

  • TOP-1 accuracy on ImageNet classification task.
model baseline Scratch Fine-tuning
vgg16 71.59% 70.47% 71.45%
resnet18 70.60% 69.94% 71.21%
resnet50 75.86% 75.42% 76.67%
mobilenetv1 72.29% 72.05% 71.76%
  • Top-1 accuracy of blocked networks with respect to blocking ratio under fixed blocking (F) and hierarchical blocking (H).
model H2x2 H4x4 H8x8 H16x16 F112 F56 F28 F14
vgg16 70.14% 70.28% 70.76% 71.18% 71.81% 71.74% 71.45% 70.48%
resnet18 70.06% 70.67% 71.12% 70.82% 71.60% 71.37% 71.21% 70.20%
mobilenetv1 69.96% 71.49% 71.53% 71.50% 72.16% 71.89% 71.76% 71.13%
  • Impact of block padding on classification accuracy.
model zero replicate reflect
vgg16 71.45% 70.90% 70.22%
resnet18 71.21% 70.92% 70.61%
resnet50 76.67% 76.71% 76.47%
mobilenetv1 71.76% 71.92% 71.58%

Citation

If you found the library useful for your work, please kindly cite our work:

@article{Gangli2022BlockConv,  
    author={Li, Gang and Liu, Zejian and Li, Fanrong and Cheng, Jian},  
    journal={IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems},   
    title={Block Convolution: Toward Memory-Efficient Inference of Large-Scale CNNs on FPGA},   
    year={2022},  
    volume={41},  
    number={5},  
    pages={1436-1447},  
    doi={10.1109/TCAD.2021.3082868}
}