/PTSEFormer

[ECCV2022] PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection

Primary LanguagePythonMIT LicenseMIT

PTSEFormer

This repo is an official implementation of PTSEFormer, which is accepted by ECCV 2022.

Citing PTSEFormer

Please consider citing our paper if you find it useful:

@article{ptseformer,
    Author = {Han, Wang and Jun, Tang and Xiaodong, Liu and Shanyan, Guan and Rong, Xie and Li, Song},
    Title = {PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection},
    Conference = {ECCV},
    Year = {2022}
}

Installation

Requirements

  • Linux, CUDA>=9.2, GCC>=5.4

  • Python>=3.7

    conda create -n PTSEFormer python=3.7 pip
    

    Then, activate the environment:

    conda activate PTSEFormer
    
  • PyTorch>=1.5.1, torchvision>=0.6.1

    conda install pytorch=1.5.1 torchvision=0.6.1 cudatoolkit=9.2 -c pytorch
    
  • Other requirements

    pip install -r requirements.txt
    

Data preparation

Please download the ILSVRC2015 DET and ILSVRC2015 VID dataset from here and organize them as following.

data_root/
	└──ILSVRC2015/
		├──ImageSets/
		├──Annotations/
		├──Data/

Evaluation

The inference command line for testing on the validation dataset:

python -m torch.distributed.launch --nproc_per_node=8 tools/test.py --config-file experiments/PTSEFormer_r101_8gpus.yaml

Pretrained model can be found here.

Training

The training command line for training on a combined dataset of VID and DET:

python -m torch.distributed.launch --nproc_per_node=8 tools/train.py --config-file experiments/PTSEFormer_r101_8gpus.yaml