RVT: Recurrent Vision Transformers for Object Detection with Event Cameras

This is the official Pytorch implementation of the CVPR 2023 paper Recurrent Vision Transformers for Object Detection with Event Cameras.

Watch the video for a quick overview.

@InProceedings{Gehrig_2023_CVPR,
  author  = {Mathias Gehrig and Davide Scaramuzza},
  title   = {Recurrent Vision Transformers for Object Detection with Event Cameras},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year    = {2023},
}

Conda Installation

We highly recommend to use Mambaforge to reduce the installation time.

conda create -y -n rvt python=3.9 pip
conda activate rvt
conda config --set channel_priority flexible

CUDA_VERSION=11.8

conda install -y h5py=3.8.0 blosc-hdf5-plugin=1.0.0 \
hydra-core=1.3.2 einops=0.6.0 torchdata=0.6.0 tqdm numba \
pytorch=2.0.0 torchvision=0.15.0 pytorch-cuda=$CUDA_VERSION \
-c pytorch -c nvidia -c conda-forge

python -m pip install pytorch-lightning==1.8.6 wandb==0.14.0 \
pandas==1.5.3 plotly==5.13.1 opencv-python==4.6.0.66 tabulate==0.9.0 \
pycocotools==2.0.6 bbox-visualizer==0.1.0 StrEnum==0.4.10
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Detectron2 is not strictly required but speeds up the evaluation.

Required Data

To evaluate or train RVT you will need to download the required preprocessed datasets:

	1 Mpx	Gen1
pre-processed dataset	download	download
crc32	`c5ec7c38`	`5acab6f3`

You may also pre-process the dataset yourself by following the instructions.

Pre-trained Checkpoints

1 Mpx

	RVT-Base	RVT-Small	RVT-Tiny
pre-trained checkpoint	download	download	download
md5	`72923a`	`a94207`	`5a3c78`

Gen1

	RVT-Base	RVT-Small	RVT-Tiny
pre-trained checkpoint	download	download	download
md5	`839317`	`840f2b`	`a770b9`

Evaluation

Set DATA_DIR as the path to either the 1 Mpx or Gen1 dataset directory
Set CKPT_PATH to the path of the correct checkpoint matching the choice of the model and dataset.
Set
- MDL_CFG=base, or
- MDL_CFG=small, or
- MDL_CFG=tiny
to load either the base, small, or tiny model configuration
Set
- USE_TEST=1 to evaluate on the test set, or
- USE_TEST=0 to evaluate on the validation set
Set GPU_ID to the PCI BUS ID of the GPU that you want to use. e.g. GPU_ID=0. Only a single GPU is supported for evaluation

1 Mpx

python validation.py dataset=gen4 dataset.path=${DATA_DIR} checkpoint=${CKPT_PATH} \
use_test_set=${USE_TEST} hardware.gpus=${GPU_ID} +experiment/gen4="${MDL_CFG}.yaml" \
batch_size.eval=8 model.postprocess.confidence_threshold=0.001

Gen1

python validation.py dataset=gen1 dataset.path=${DATA_DIR} checkpoint=${CKPT_PATH} \
use_test_set=${USE_TEST} hardware.gpus=${GPU_ID} +experiment/gen1="${MDL_CFG}.yaml" \
batch_size.eval=8 model.postprocess.confidence_threshold=0.001

Training

Set DATA_DIR as the path to either the 1 Mpx or Gen1 dataset directory
Set
- MDL_CFG=base, or
- MDL_CFG=small, or
- MDL_CFG=tiny
to load either the base, small, or tiny model configuration
Set GPU_IDS to the PCI BUS IDs of the GPUs that you want to use. e.g. GPU_IDS=[0,1] for using GPU 0 and 1. Using a list of IDS will enable single-node multi-GPU training. Pay attention to the batch size which is defined per GPU:
Set BATCH_SIZE_PER_GPU such that the effective batch size is matching the parameters below. The effective batch size is (batch size per gpu)*(number of GPUs).
If you would like to change the effective batch size, we found the following learning rate scaling to work well for all models on both datasets:

lr = 2e-4 * sqrt(effective_batch_size/8).
The training code uses W&B for logging during the training. Hence, we assume that you have a W&B account.
- The training script below will create a new project called RVT. Adapt the project name and group name if necessary.

1 Mpx

The effective batch size for the 1 Mpx training is 24.
To train on 2 GPUs using 6 workers per GPU for training and 2 workers per GPU for evaluation:

GPU_IDS=[0,1]
BATCH_SIZE_PER_GPU=12
TRAIN_WORKERS_PER_GPU=6
EVAL_WORKERS_PER_GPU=2
python train.py model=rnndet dataset=gen4 dataset.path=${DATA_DIR} wandb.project_name=RVT \
wandb.group_name=1mpx +experiment/gen4="${MDL_CFG}.yaml" hardware.gpus=${GPU_IDS} \
batch_size.train=${BATCH_SIZE_PER_GPU} batch_size.eval=${BATCH_SIZE_PER_GPU} \
hardware.num_workers.train=${TRAIN_WORKERS_PER_GPU} hardware.num_workers.eval=${EVAL_WORKERS_PER_GPU}

If you instead want to execute the training on 4 GPUs simply adapt GPU_IDS and BATCH_SIZE_PER_GPU accordingly:

GPU_IDS=[0,1,2,3]
BATCH_SIZE_PER_GPU=6

Gen1

The effective batch size for the Gen1 training is 8.
To train on 1 GPU using 6 workers for training and 2 workers for evaluation:

GPU_IDS=0
BATCH_SIZE_PER_GPU=8
TRAIN_WORKERS_PER_GPU=6
EVAL_WORKERS_PER_GPU=2
python train.py model=rnndet dataset=gen1 dataset.path=${DATA_DIR} wandb.project_name=RVT \
wandb.group_name=gen1 +experiment/gen1="${MDL_CFG}.yaml" hardware.gpus=${GPU_IDS} \
batch_size.train=${BATCH_SIZE_PER_GPU} batch_size.eval=${BATCH_SIZE_PER_GPU} \
hardware.num_workers.train=${TRAIN_WORKERS_PER_GPU} hardware.num_workers.eval=${EVAL_WORKERS_PER_GPU}

Code Acknowledgments

This project has used code from the following projects:

timm for the MaxViT layer implementation in Pytorch
YOLOX for the detection PAFPN/head

ostromb/RVT

RVT: Recurrent Vision Transformers for Object Detection with Event Cameras

Conda Installation

Required Data

Pre-trained Checkpoints

1 Mpx

Gen1

Evaluation

1 Mpx

Gen1

Training

1 Mpx

Gen1

Code Acknowledgments