Fast-SNN

This repo holds the codes for Fast-SNN.

Dependencies

Python 3.8.8
Pytorch 1.8.1

Prepare Quantized ANNs

For training quantized ANNs, we follow the protocol defined in Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks

For more details, please refer to APoT_Quantization

Image Classification

CIFAR-10

Architectures

For network architectures, we currently support AlexNet, VGG11 (in 'CIFAR10'), ResNet-20/32/44/56/110 (in 'CIFAR-10'), and ResNet-18 (in 'CIFAR10_resnet18'). For AlexNet, VGG11, and ResNet-20/32/44/56/110, we quantize both weights and activations. For ResNet-18, we quantize activations.

Dataset

By default, the dataset is supposed to be in a 'data' folder at the same lavel of 'main.py'

Train Quantized ANNs

We progressively train full precision, 4, 3, and 2 bit ANN models.

An example to train AlexNet:

python main.py --arch alex --bit 32 --wd 5e-4
python main.py --arch alex --bit 4 --wd 1e-4  --lr 4e-2 --init result/alex_32bit/model_best.pth.tar
python main.py --arch alex --bit 3 --wd 1e-4  --lr 4e-2 --init result/alex_4bit/model_best.pth.tar
python main.py --arch alex --bit 2 --wd 3e-5  --lr 4e-2 --init result/alex_3bit/model_best.pth.tar

Evaluate Converted SNNs

The time steps of SNNs are automatically calculated from activation precision, i.e., T = 2^b-1. By default, we use signed IF neuron model.

optinal arguments:
    --u                    Use unsigned IF neuron model

Example: AlexNet(SNN) performance with traditional unsigned IF neuron model. An 3/2-bit ANN is converted to an SNN with T=3/7.

python snn.py --arch alex --bit 3 -e -u --init result/alex_3bit/model_best.pth.tar
python snn.py --arch alex --bit 2 -e -u --init result/alex_2bit/model_best.pth.tar

Example: AlexNet(SNN) performance with signed IF neuron model. An 3/2-bit ANN is converted to an SNN with T=3/7.

python snn.py --arch alex --bit 3 -e -u --init result/alex_3bit/model_best.pth.tar
python snn.py --arch alex --bit 2 -e -u --init result/alex_2bit/model_best.pth.tar

Fine-tune Converted SNNs

By default, we use signed IF neuron model during fine-tuning.

optinal arguments:
    --num_epochs / -n               Number of epochs to fine-tune at each layer
                                    default: 1
    --force                         Always update fine-tuned parameters without evaluation on training data

Example: finetune converted SNN models.

python snn_ft.py --arch alex --bit 2 --force --init result/alex_2bit/model_best.pth.tar
python snn_ft.py --arch resnet18 --bit 2 --force --init result/resnet18_2bit/model_best.pth.tar
python snn_ft.py --arch resnet56 --bit 2 -n 8 --init result/resnet56_2bit/model_best.pth.tar

Checkpoints for Quantized Models

Model	3-bit	2-bit
AlexNet	alex_3bit	alex_2bit
VGG11	vgg11_3bit	vgg11_2bit
ResNet20	resnet20_3bit	resnet20_2bit
ResNet44	resnet44_3bit	resnet44_2bit
ResNet56	resnet56_3bit	resnet56_2bit
ResNet18	resnet18_3bit	resnet18_2bit

ImageNet

We use distributed data parallel (DDP) for training. Please refer to Pytorch DDP for details.

To speed up data loading, we replace the vanilla Pytorch dataloader with nvidia-dali.

Nvidia-dali package

# for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda100
# for CUDA 11
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda110

For more details on nvidia-dali, please refer to NVIDIA's official document NVIDIA DALI Documentation

Architectures

For network architectures, we currently support AlexNet and VGG16.

Train Qantized ANNs

With full-precision pre-trained models from TorchVision, we progressively 4, 3, and 2 bit ANN models.

An example to train AlexNet:

python -m torch.distributed.launch --nproc_per_node=4 dali_main.py -a alexnet -b 256 --bit 4 --workers 4 --lr=0.1 --epochs 60 --dali_cpu /data/imagenet2012
python -m torch.distributed.launch --nproc_per_node=4 dali_main.py -a alexnet -b 256 --bit 3 --init result/alexnet_4bit/model_best.pth.tar --workers 4 --lr=0.01 --epochs 60 --dali_cpu /data/imagenet2012
python -m torch.distributed.launch --nproc_per_node=4 dali_main.py -a alexnet -b 256 --bit 2 --init result/alexnet_3bit/model_best.pth.tar --workers 4 --lr=0.01 --epochs 60 --dali_cpu /data/imagenet2012

Evaluate Converted SNNs

Example: AlexNet (SNN) performance with traditional unsigned IF neuron model. A 3/2-bit ANN is converted to an SNN with T=7/3.

python -m torch.distributed.launch --nproc_per_node=4 snn.py -a alexnet -b 256 -e -u --bit 3 --init result/alexnet_3bit/model_best.pth.tar --workers 4 --dali_cpu /data/imagenet2012
python -m torch.distributed.launch --nproc_per_node=4 snn.py -a alexnet -b 256 -e -u --bit 2 --init result/alexnet_2bit/model_best.pth.tar --workers 4 --dali_cpu /data/imagenet2012

Example: AlexNEt (SNN) performance with signed IF neuron model. A 3/2-bit ANN is converted to an SNN with T=7/3.

python -m torch.distributed.launch --nproc_per_node=4 snn.py -a alexnet -b 256 -e --bit 3 --init result/alexnet_3bit/model_best.pth.tar --workers 4 --dali_cpu /data/imagenet2012
python -m torch.distributed.launch --nproc_per_node=4 snn.py -a alexnet -b 256 -e --bit 2 --init result/alexnet_2bit/model_best.pth.tar --workers 4 --dali_cpu /data/imagenet2012

Finetune converted SNNs

By default, we use signed IF neuron model in fine-tuning.

Example:

python -m torch.distributed.launch --nproc_per_node=4 snn_ft.py -a alexnet -b 128 --bit 3 -n 8 --init result/alexnet_3bit/model_best.pth.tar --workers 4 --dali_cpu /data/imagenet2012
python -m torch.distributed.launch --nproc_per_node=4 snn_ft.py -a alexnet -b 128 --bit 2 -n 8 --init result/alexnet_2bit/model_best.pth.tar --workers 4 --dali_cpu /data/imagenet2012

Checkpoints for Quantized Models

Model	3-bit	2-bit
AlexNet	alexnet_3bit	alexnet_2bit
VGG16	vgg16_3bit	vgg16_2bit

Object Detection

We use yolov2-yolov3_PyTorch as the framework for object detection.

Preparation

About required packages and datasets, please refer to README in yolov2-yolov3_PyTorch for preparation. In the 'object detection' folder, we also prepare a merged README detailing everything.

Architecture

We currently support Tiny YOLO and YOLOv2 with a ResNet-34 backbone.

optinal arguments:
    --version / -v               Supported architecture
                                 available: yolov2_tiny, yolov2_r34

PASCAL VOC 2007

Train Quantized ANNs

Example: train Tiny YOLO with activations qunatized to 32/4/3/2 bits.

python -m torch.distributed.launch --nproc_per_node=4 train.py -d voc -v yolov2_tiny -ms --ema --sybn --batch_size 4 --bit 32
python -m torch.distributed.launch --nproc_per_node=4 train.py -d voc -v yolov2_tiny -ms --ema --sybn --batch_size 4 --bit 4 --init CHECKPOINT_PATH
python -m torch.distributed.launch --nproc_per_node=4 train.py -d voc -v yolov2_tiny -ms --ema --sybn --batch_size 4 --bit 3 --init CHECKPOINT_PATH
python -m torch.distributed.launch --nproc_per_node=4 train.py -d voc -v yolov2_tiny -ms --ema --sybn --batch_size 4 --bit 2 --init CHECKPOINT_PATH

Evaluate Models

optinal arguments:
    --spike               Evaluate with spikes (as SNNs)

Example: evaluate Tiny YOLO (SNN) with T = 15, 7, 3

python eval.py -d voc --cuda -v yolov2_tiny --bit 4 --spike --init CHECKPOINT_PATH
python eval.py -d voc --cuda -v yolov2_tiny --bit 3 --spike --init CHECKPOINT_PATH
python eval.py -d voc --cuda -v yolov2_tiny --bit 2 --spike --init CHECKPOINT_PATH

Checkpoints for Quantized Models

Model	4-bit	3-bit	2-bit
Tiny Yolo	yolov2_tiny_4bit	yolov2_tiny_3bit	yolov2_tiny_2bit
YoloV2(ResNet-34)	yolov2_r34_4bit	yolov2_r34_3bit	yolov2_r34_2bit

MS COCO 2017

Train Quantized ANNs

Example: train Tiny YOLO with activations qunatized to 32/4/3/2 bits.

python -m torch.distributed.launch --nproc_per_node=4 train.py -d coco -v yolov2_tiny --bit 32 -ms --ema --sybn --batch_size 4 
python -m torch.distributed.launch --nproc_per_node=4 train.py -d coco -v yolov2_tiny --bit 4 -ms --ema --sybn --batch_size 4  --init CHECKPOINT_PATH
python -m torch.distributed.launch --nproc_per_node=4 train.py -d coco -v yolov2_tiny --bit 3 -ms --ema --sybn --batch_size 4 --init CHECKPOINT_PATH
python -m torch.distributed.launch --nproc_per_node=4 train.py -d coco -v yolov2_tiny --bit 2 -ms --ema --sybn --batch_size 4 --init CHECKPOINT_PATH

Evaluate Models

Example: evaluate Tiny YOLO (SNN) with T = 15, 7, 3

python eval.py -d coco-val --cuda -v yolov2_tiny --bit 4 --spike --init CHECKPOINT_PATH
python eval.py -d coco-val --cuda -v yolov2_tiny --bit 3 --spike --init CHECKPOINT_PATH
python eval.py -d coco-val --cuda -v yolov2_tiny --bit 2 --spike --init CHECKPOINT_PATH

Checkpoints for Quantized Models

Model	4-bit	3-bit	2-bit
Tiny Yolo	yolov2_tiny_4bit	yolov2_tiny_3bit	yolov2_tiny_2bit
YoloV2(ResNet-34)	yolov2_r34_4bit	yolov2_r34_3bit	yolov2_r34_2bit

Semantic Segmentation

We use vedaseg, an open source semantic segmentation toolbox based on PyTorch, as the framework for semantic segmentation.

Preparation

About required packages and datasets, please refer to README in vedaseg for preparation. In the 'semantic segmentation' folder, we also prepare a merged README detailing everything.

Architecture

We currently support Deeplabv1 (VGG9) and Deeplabv3 (ResNet-34 + ASPP).

PASCAL VOC 2012

Train Quantized ANNs

Example: train VGG9 with activations qunatized to 32/4/3/2 bits.

bash ./tools/dist_train.sh configs/voc_deeplabv1.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/voc_deeplabv1_4bit.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/voc_deeplabv1_3bit.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/voc_deeplabv1_2bit.py "0, 1, 2, 3"

Example: train ResNet-34 + ASPP with activations qunatized to 32/4/3/2 bits.

bash ./tools/dist_train.sh configs/voc_deeplabv3.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/voc_deeplabv3_4bit.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/voc_deeplabv3_3bit.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/voc_deeplabv3_2bit.py "0, 1, 2, 3"

Evaluate Models

Example: evaluate VGG9 (SNN) with T = 15, 7, 3

bash ./tools/dist_test.sh configs/voc_deeplabv1_T15.py './workdir/voc_deeplabv1_4bit/best_mIoU.pth' "0, 1, 2, 3" 
bash ./tools/dist_test.sh configs/voc_deeplabv1_T7.py './workdir/voc_deeplabv1_3bit/best_mIoU.pth' "0, 1, 2, 3" 
bash ./tools/dist_test.sh configs/voc_deeplabv1_T3.py './workdir/voc_deeplabv1_2bit/best_mIoU.pth' "0, 1, 2, 3"

Example: evaluate ResNet-34 + ASPP (SNN) with T = 15, 7, 3

bash ./tools/dist_test.sh configs/voc_deeplabv3_T15.py './workdir/voc_deeplabv3_4bit/best_mIoU.pth' "0, 1, 2, 3" 
bash ./tools/dist_test.sh configs/voc_deeplabv3_T7.py './workdir/voc_deeplabv3_3bit/best_mIoU.pth' "0, 1, 2, 3" 
bash ./tools/dist_test.sh configs/voc_deeplabv3_T3.py './workdir/voc_deeplabv3_2bit/best_mIoU.pth' "0, 1, 2, 3"

Checkpoints for Quantized Models

Model	4-bit	3-bit	2-bit
VGG-9	voc_deeplabv1_4bit	voc_deeplabv1_3bit	voc_deeplabv1_2bit
ResNet-34 + ASPP	voc_deeplabv3_4bit	voc_deeplabv3_3bit	voc_deeplabv3_2bit

MS COCO 2017

Train Quantized ANNs

Example: train VGG9 with activations qunatized to 32/4/3/2 bits.

bash ./tools/dist_train.sh configs/coco_deeplabv1.py "0, 1, 2, 3, 6, 7" 
bash ./tools/dist_train.sh configs/coco_deeplabv1_4bit.py "0, 1, 2, 3, 6, 7" 
bash ./tools/dist_train.sh configs/coco_deeplabv1_3bit.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/coco_deeplabv1_2bit.py "0, 1, 2, 3"

Example: train ResNet-34 + ASPP with activations qunatized to 32/4/3/2 bits.

bash ./tools/dist_train.sh configs/coco_deeplabv3.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/coco_deeplabv3_4bit.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/coco_deeplabv3_3bit.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/coco_deeplabv3_2bit.py "0, 1, 2, 3"

Evaluate Models

Example: evaluate VGG9 (SNN) with T = 15, 7, 3

bash ./tools/dist_test.sh configs/coco_deeplabv1_T15.py './workdir/coco_deeplabv1_4bit/best_mIoU.pth' "0, 1, 2, 3" 
bash ./tools/dist_test.sh configs/coco_deeplabv1_T7.py './workdir/coco_deeplabv1_3bit/best_mIoU.pth' "0, 1, 2, 3" 
bash ./tools/dist_test.sh configs/coco_deeplabv1_T3.py './workdir/coco_deeplabv1_2bit/best_mIoU.pth' "0, 1, 2, 3"

Example: evaluate ResNet-34 + ASPP (SNN) with T = 15, 7, 3

bash ./tools/dist_test.sh configs/coco_deeplabv3_T15.py './workdir/coco_deeplabv3_4bit/best_mIoU.pth' "0, 1, 2, 3" 
bash ./tools/dist_test.sh configs/coco_deeplabv3_T7.py './workdir/coco_deeplabv3_3bit/best_mIoU.pth' "0, 1, 2, 3" 
bash ./tools/dist_test.sh configs/coco_deeplabv3_T3.py './workdir/coco_deeplabv3_2bit/best_mIoU.pth' "0, 1, 2, 3"

Checkpoints for Quantized Models

Model	4-bit	3-bit	2-bit
VGG-9	coco_deeplabv1_4bit	coco_deeplabv1_3bit	coco_deeplabv1_2bit
ResNet-34 + ASPP	coco_deeplabv3_4bit	coco_deeplabv3_3bit	coco_deeplabv3_2bit

yangfan-hu/Fast-SNN

Fast-SNN

Dependencies

Prepare Quantized ANNs

Image Classification

CIFAR-10

Architectures

Dataset

Train Quantized ANNs

Evaluate Converted SNNs

Fine-tune Converted SNNs

Checkpoints for Quantized Models

ImageNet

Architectures

Train Qantized ANNs

Evaluate Converted SNNs

Finetune converted SNNs

Checkpoints for Quantized Models

Object Detection

Preparation

Architecture

PASCAL VOC 2007

Train Quantized ANNs

Evaluate Models

Checkpoints for Quantized Models

MS COCO 2017

Train Quantized ANNs

Evaluate Models

Checkpoints for Quantized Models

Semantic Segmentation

Preparation

Architecture

PASCAL VOC 2012

Train Quantized ANNs

Evaluate Models

Checkpoints for Quantized Models

MS COCO 2017

Train Quantized ANNs

Evaluate Models

Checkpoints for Quantized Models