/Fast-SNN

Primary LanguagePython

Fast-SNN

This repo holds the codes for Fast-SNN.

Dependencies

  • Python 3.8.8
  • Pytorch 1.8.1

Prepare Quantized ANNs

For training quantized ANNs, we follow the protocol defined in Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks

For more details, please refer to APoT_Quantization

Image Classification

CIFAR-10

Architectures

For network architectures, we currently support AlexNet, VGG11 (in 'CIFAR10'), ResNet-20/32/44/56/110 (in 'CIFAR-10'), and ResNet-18 (in 'CIFAR10_resnet18'). For AlexNet, VGG11, and ResNet-20/32/44/56/110, we quantize both weights and activations. For ResNet-18, we quantize activations.

Dataset

By default, the dataset is supposed to be in a 'data' folder at the same lavel of 'main.py'

Train Quantized ANNs

We progressively train full precision, 4, 3, and 2 bit ANN models.

An example to train AlexNet:

python main.py --arch alex --bit 32 --wd 5e-4
python main.py --arch alex --bit 4 --wd 1e-4  --lr 4e-2 --init result/alex_32bit/model_best.pth.tar
python main.py --arch alex --bit 3 --wd 1e-4  --lr 4e-2 --init result/alex_4bit/model_best.pth.tar
python main.py --arch alex --bit 2 --wd 3e-5  --lr 4e-2 --init result/alex_3bit/model_best.pth.tar

Evaluate Converted SNNs

The time steps of SNNs are automatically calculated from activation precision, i.e., T = 2^b-1. By default, we use signed IF neuron model.

optinal arguments:
    --u                    Use unsigned IF neuron model

Example: AlexNet(SNN) performance with traditional unsigned IF neuron model. An 3/2-bit ANN is converted to an SNN with T=3/7.

python snn.py --arch alex --bit 3 -e -u --init result/alex_3bit/model_best.pth.tar
python snn.py --arch alex --bit 2 -e -u --init result/alex_2bit/model_best.pth.tar

Example: AlexNet(SNN) performance with signed IF neuron model. An 3/2-bit ANN is converted to an SNN with T=3/7.

python snn.py --arch alex --bit 3 -e -u --init result/alex_3bit/model_best.pth.tar
python snn.py --arch alex --bit 2 -e -u --init result/alex_2bit/model_best.pth.tar

Fine-tune Converted SNNs

By default, we use signed IF neuron model during fine-tuning.

optinal arguments:
    --num_epochs / -n               Number of epochs to fine-tune at each layer
                                    default: 1
    --force                         Always update fine-tuned parameters without evaluation on training data

Example: finetune converted SNN models.

python snn_ft.py --arch alex --bit 2 --force --init result/alex_2bit/model_best.pth.tar
python snn_ft.py --arch resnet18 --bit 2 --force --init result/resnet18_2bit/model_best.pth.tar
python snn_ft.py --arch resnet56 --bit 2 -n 8 --init result/resnet56_2bit/model_best.pth.tar

Checkpoints for Quantized Models

Model 3-bit 2-bit
AlexNet alex_3bit alex_2bit
VGG11 vgg11_3bit vgg11_2bit
ResNet20 resnet20_3bit resnet20_2bit
ResNet44 resnet44_3bit resnet44_2bit
ResNet56 resnet56_3bit resnet56_2bit
ResNet18 resnet18_3bit resnet18_2bit

ImageNet

We use distributed data parallel (DDP) for training. Please refer to Pytorch DDP for details.

To speed up data loading, we replace the vanilla Pytorch dataloader with nvidia-dali.

Nvidia-dali package

# for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda100
# for CUDA 11
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda110

For more details on nvidia-dali, please refer to NVIDIA's official document NVIDIA DALI Documentation

Architectures

For network architectures, we currently support AlexNet and VGG16.

Train Qantized ANNs

With full-precision pre-trained models from TorchVision, we progressively 4, 3, and 2 bit ANN models.

An example to train AlexNet:

python -m torch.distributed.launch --nproc_per_node=4 dali_main.py -a alexnet -b 256 --bit 4 --workers 4 --lr=0.1 --epochs 60 --dali_cpu /data/imagenet2012
python -m torch.distributed.launch --nproc_per_node=4 dali_main.py -a alexnet -b 256 --bit 3 --init result/alexnet_4bit/model_best.pth.tar --workers 4 --lr=0.01 --epochs 60 --dali_cpu /data/imagenet2012
python -m torch.distributed.launch --nproc_per_node=4 dali_main.py -a alexnet -b 256 --bit 2 --init result/alexnet_3bit/model_best.pth.tar --workers 4 --lr=0.01 --epochs 60 --dali_cpu /data/imagenet2012

Evaluate Converted SNNs

Example: AlexNet (SNN) performance with traditional unsigned IF neuron model. A 3/2-bit ANN is converted to an SNN with T=7/3.

python -m torch.distributed.launch --nproc_per_node=4 snn.py -a alexnet -b 256 -e -u --bit 3 --init result/alexnet_3bit/model_best.pth.tar --workers 4 --dali_cpu /data/imagenet2012
python -m torch.distributed.launch --nproc_per_node=4 snn.py -a alexnet -b 256 -e -u --bit 2 --init result/alexnet_2bit/model_best.pth.tar --workers 4 --dali_cpu /data/imagenet2012

Example: AlexNEt (SNN) performance with signed IF neuron model. A 3/2-bit ANN is converted to an SNN with T=7/3.

python -m torch.distributed.launch --nproc_per_node=4 snn.py -a alexnet -b 256 -e --bit 3 --init result/alexnet_3bit/model_best.pth.tar --workers 4 --dali_cpu /data/imagenet2012
python -m torch.distributed.launch --nproc_per_node=4 snn.py -a alexnet -b 256 -e --bit 2 --init result/alexnet_2bit/model_best.pth.tar --workers 4 --dali_cpu /data/imagenet2012

Finetune converted SNNs

By default, we use signed IF neuron model in fine-tuning.

Example:

python -m torch.distributed.launch --nproc_per_node=4 snn_ft.py -a alexnet -b 128 --bit 3 -n 8 --init result/alexnet_3bit/model_best.pth.tar --workers 4 --dali_cpu /data/imagenet2012
python -m torch.distributed.launch --nproc_per_node=4 snn_ft.py -a alexnet -b 128 --bit 2 -n 8 --init result/alexnet_2bit/model_best.pth.tar --workers 4 --dali_cpu /data/imagenet2012

Checkpoints for Quantized Models

Model 3-bit 2-bit
AlexNet alexnet_3bit alexnet_2bit
VGG16 vgg16_3bit vgg16_2bit

Object Detection

We use yolov2-yolov3_PyTorch as the framework for object detection.

Preparation

About required packages and datasets, please refer to README in yolov2-yolov3_PyTorch for preparation. In the 'object detection' folder, we also prepare a merged README detailing everything.

Architecture

We currently support Tiny YOLO and YOLOv2 with a ResNet-34 backbone.

optinal arguments:
    --version / -v               Supported architecture
                                 available: yolov2_tiny, yolov2_r34

PASCAL VOC 2007

Train Quantized ANNs

Example: train Tiny YOLO with activations qunatized to 32/4/3/2 bits.

python -m torch.distributed.launch --nproc_per_node=4 train.py -d voc -v yolov2_tiny -ms --ema --sybn --batch_size 4 --bit 32
python -m torch.distributed.launch --nproc_per_node=4 train.py -d voc -v yolov2_tiny -ms --ema --sybn --batch_size 4 --bit 4 --init CHECKPOINT_PATH
python -m torch.distributed.launch --nproc_per_node=4 train.py -d voc -v yolov2_tiny -ms --ema --sybn --batch_size 4 --bit 3 --init CHECKPOINT_PATH
python -m torch.distributed.launch --nproc_per_node=4 train.py -d voc -v yolov2_tiny -ms --ema --sybn --batch_size 4 --bit 2 --init CHECKPOINT_PATH

Evaluate Models

optinal arguments:
    --spike               Evaluate with spikes (as SNNs)

Example: evaluate Tiny YOLO (SNN) with T = 15, 7, 3

python eval.py -d voc --cuda -v yolov2_tiny --bit 4 --spike --init CHECKPOINT_PATH
python eval.py -d voc --cuda -v yolov2_tiny --bit 3 --spike --init CHECKPOINT_PATH
python eval.py -d voc --cuda -v yolov2_tiny --bit 2 --spike --init CHECKPOINT_PATH

Checkpoints for Quantized Models

Model 4-bit 3-bit 2-bit
Tiny Yolo yolov2_tiny_4bit yolov2_tiny_3bit yolov2_tiny_2bit
YoloV2(ResNet-34) yolov2_r34_4bit yolov2_r34_3bit yolov2_r34_2bit

MS COCO 2017

Train Quantized ANNs

Example: train Tiny YOLO with activations qunatized to 32/4/3/2 bits.

python -m torch.distributed.launch --nproc_per_node=4 train.py -d coco -v yolov2_tiny --bit 32 -ms --ema --sybn --batch_size 4 
python -m torch.distributed.launch --nproc_per_node=4 train.py -d coco -v yolov2_tiny --bit 4 -ms --ema --sybn --batch_size 4  --init CHECKPOINT_PATH
python -m torch.distributed.launch --nproc_per_node=4 train.py -d coco -v yolov2_tiny --bit 3 -ms --ema --sybn --batch_size 4 --init CHECKPOINT_PATH
python -m torch.distributed.launch --nproc_per_node=4 train.py -d coco -v yolov2_tiny --bit 2 -ms --ema --sybn --batch_size 4 --init CHECKPOINT_PATH
Evaluate Models

Example: evaluate Tiny YOLO (SNN) with T = 15, 7, 3

python eval.py -d coco-val --cuda -v yolov2_tiny --bit 4 --spike --init CHECKPOINT_PATH
python eval.py -d coco-val --cuda -v yolov2_tiny --bit 3 --spike --init CHECKPOINT_PATH
python eval.py -d coco-val --cuda -v yolov2_tiny --bit 2 --spike --init CHECKPOINT_PATH 

Checkpoints for Quantized Models

Model 4-bit 3-bit 2-bit
Tiny Yolo yolov2_tiny_4bit yolov2_tiny_3bit yolov2_tiny_2bit
YoloV2(ResNet-34) yolov2_r34_4bit yolov2_r34_3bit yolov2_r34_2bit

Semantic Segmentation

We use vedaseg, an open source semantic segmentation toolbox based on PyTorch, as the framework for semantic segmentation.

Preparation

About required packages and datasets, please refer to README in vedaseg for preparation. In the 'semantic segmentation' folder, we also prepare a merged README detailing everything.

Architecture

We currently support Deeplabv1 (VGG9) and Deeplabv3 (ResNet-34 + ASPP).

PASCAL VOC 2012

Train Quantized ANNs

Example: train VGG9 with activations qunatized to 32/4/3/2 bits.

bash ./tools/dist_train.sh configs/voc_deeplabv1.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/voc_deeplabv1_4bit.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/voc_deeplabv1_3bit.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/voc_deeplabv1_2bit.py "0, 1, 2, 3" 

Example: train ResNet-34 + ASPP with activations qunatized to 32/4/3/2 bits.

bash ./tools/dist_train.sh configs/voc_deeplabv3.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/voc_deeplabv3_4bit.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/voc_deeplabv3_3bit.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/voc_deeplabv3_2bit.py "0, 1, 2, 3" 

Evaluate Models

Example: evaluate VGG9 (SNN) with T = 15, 7, 3

bash ./tools/dist_test.sh configs/voc_deeplabv1_T15.py './workdir/voc_deeplabv1_4bit/best_mIoU.pth' "0, 1, 2, 3" 
bash ./tools/dist_test.sh configs/voc_deeplabv1_T7.py './workdir/voc_deeplabv1_3bit/best_mIoU.pth' "0, 1, 2, 3" 
bash ./tools/dist_test.sh configs/voc_deeplabv1_T3.py './workdir/voc_deeplabv1_2bit/best_mIoU.pth' "0, 1, 2, 3" 

Example: evaluate ResNet-34 + ASPP (SNN) with T = 15, 7, 3

bash ./tools/dist_test.sh configs/voc_deeplabv3_T15.py './workdir/voc_deeplabv3_4bit/best_mIoU.pth' "0, 1, 2, 3" 
bash ./tools/dist_test.sh configs/voc_deeplabv3_T7.py './workdir/voc_deeplabv3_3bit/best_mIoU.pth' "0, 1, 2, 3" 
bash ./tools/dist_test.sh configs/voc_deeplabv3_T3.py './workdir/voc_deeplabv3_2bit/best_mIoU.pth' "0, 1, 2, 3" 

Checkpoints for Quantized Models

Model 4-bit 3-bit 2-bit
VGG-9 voc_deeplabv1_4bit voc_deeplabv1_3bit voc_deeplabv1_2bit
ResNet-34 + ASPP voc_deeplabv3_4bit voc_deeplabv3_3bit voc_deeplabv3_2bit

MS COCO 2017

Train Quantized ANNs

Example: train VGG9 with activations qunatized to 32/4/3/2 bits.

bash ./tools/dist_train.sh configs/coco_deeplabv1.py "0, 1, 2, 3, 6, 7" 
bash ./tools/dist_train.sh configs/coco_deeplabv1_4bit.py "0, 1, 2, 3, 6, 7" 
bash ./tools/dist_train.sh configs/coco_deeplabv1_3bit.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/coco_deeplabv1_2bit.py "0, 1, 2, 3" 

Example: train ResNet-34 + ASPP with activations qunatized to 32/4/3/2 bits.

bash ./tools/dist_train.sh configs/coco_deeplabv3.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/coco_deeplabv3_4bit.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/coco_deeplabv3_3bit.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/coco_deeplabv3_2bit.py "0, 1, 2, 3" 

Evaluate Models

Example: evaluate VGG9 (SNN) with T = 15, 7, 3

bash ./tools/dist_test.sh configs/coco_deeplabv1_T15.py './workdir/coco_deeplabv1_4bit/best_mIoU.pth' "0, 1, 2, 3" 
bash ./tools/dist_test.sh configs/coco_deeplabv1_T7.py './workdir/coco_deeplabv1_3bit/best_mIoU.pth' "0, 1, 2, 3" 
bash ./tools/dist_test.sh configs/coco_deeplabv1_T3.py './workdir/coco_deeplabv1_2bit/best_mIoU.pth' "0, 1, 2, 3" 

Example: evaluate ResNet-34 + ASPP (SNN) with T = 15, 7, 3

bash ./tools/dist_test.sh configs/coco_deeplabv3_T15.py './workdir/coco_deeplabv3_4bit/best_mIoU.pth' "0, 1, 2, 3" 
bash ./tools/dist_test.sh configs/coco_deeplabv3_T7.py './workdir/coco_deeplabv3_3bit/best_mIoU.pth' "0, 1, 2, 3" 
bash ./tools/dist_test.sh configs/coco_deeplabv3_T3.py './workdir/coco_deeplabv3_2bit/best_mIoU.pth' "0, 1, 2, 3" 

Checkpoints for Quantized Models

Model 4-bit 3-bit 2-bit
VGG-9 coco_deeplabv1_4bit coco_deeplabv1_3bit coco_deeplabv1_2bit
ResNet-34 + ASPP coco_deeplabv3_4bit coco_deeplabv3_3bit coco_deeplabv3_2bit