End-to-end 3D Tracking with Decoupled Queries

This project provides an implementation for the ICCV 2023 paper "End-to-end 3D Tracking with Decoupled Queries" based on mmDetection3D. In this work, we propose a simple yet effective framework for 3D object tracking, termed DQTrack. Specifically, it utilizes decoupled queries to address the task conflict representation in previous query-based approaches. With the designed task-specific queries, DQTrack enhances the query capability while maintaining a compact tracking pipeline.

Setup

This project is based on mmDetection3D, which can be constructed as follows.

Download and install mmdet3d (v1.0.0rc3) from the official repo.

git clone --branch v1.0.0rc3 https://github.com/open-mmlab/mmdetection3d.git

Our model is tested with torch 1.13.1, mmcv-full 1.4.0, and mmdet 2.24.0. You can install them by

pip3 install torch==1.13.1 mmcv-full==1.4.0 mmdet==2.24.0

Install mmdet3d following mmdetection3d/docs/en/getting_started.md.
To avoid potential error in MOT evaluation, please make sure motmetrics<=1.1.3.
Copy our project and related files to installed mmDetection3D:

cp -r projects mmdetection3d/
cp -r extra_tools mmdetection3d/

(Optional) Compile and install essential VoxelPooling if you want to use stereo-based network.

python3 extra_tools/setup.py develop

Data Preparation

Please prepare the data and download the preprocessed info as follows:

Download nuScenes 3D detection data HERE and unzip all zip files.
Like the general way to prepare dataset, it is recommended to symlink the dataset root to mmdetection3d/data.
Download pretrained models & infos HERE and move the downloaded info and models to mmdetection3d/data/infos and mmdetection3d/ckpts.

The folder structure should be organized as follows before our processing.

mmdetection3d
├── mmdet3d
├── tools
├── configs
├── extra_tools
├── projects
├── ckpts
│   ├── model_val
│   ├── pretrain
├── data
│   ├── nuscenes
│   │   ├── maps
│   │   ├── samples
│   │   ├── sweeps
│   │   ├── v1.0-test
│   │   ├── v1.0-trainval
│   ├── infos
│   │   ├── track_cat_10_infos_train.pkl
│   │   ├── track_cat_10_infos_val.pkl
│   │   ├── track_test_cat_10_infos_test.pkl
│   │   ├── mmdet3d_nuscenes_30f_infos_train.pkl
│   │   ├── mmdet3d_nuscenes_30f_infos_val.pkl
│   │   ├── mmdet3d_nuscenes_30f_infos_test.pkl

Training

You can train the model following the instructions. You can find the pretrained models HERE if you want to train the model. If you want to train the detector and tracker in an end-to-end manner from scratch, please turn the parameter train_track_only in config file to False.

For example, to launch DQTrack training on multi GPUs, one should execute:

cd /path/to/mmdetection3d
bash extra_tools/dist_train.sh ${CFG_FILE} ${NUM_GPUS}

or train with a single GPU:

python3 extra_tools/train.py ${CFG_FILE}

Evaluation

You can evaluate the model following the instructions.

ATTENTION: Because the sequential property of data, only the single GPU evaluation manner is supported:

python3 extra_tools/test.py ${CFG_FILE} ${CKPT} --eval=bbox

Model and Results

We provide results on nuScenes val set with pretrained models. All the models can be founded in model_val of HERE.

	Encoder	Decoder	Resolution	AMOTA	AMOTP
DQTrack-DETR3D	R101	DETR3D	900x1600	36.7%	1.351
DQTrack-UVTR	R101	UVTR-C	900x1600	39.6%	1.310
DQTrack-Stereo	R50	Stereo	512x1408	36.9%	1.371
DQTrack-Stereo	R101	Stereo	512x1408	40.7%	1.317
DQTrack-PETRV2	V2-99	PETRV2	320x800	44.4%	1.252

License

This work is made available under the Nvidia Source Code License-NC. Click here to view a copy of this license.

The pre-trained models are shared under CC-BY-NC-SA-4.0. If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing.

Citation

If this work is helpful for your research, please consider citing:

@inproceedings{li2023end,
  title={End-to-end 3D Tracking with Decoupled Queries},
  author={Li, Yanwei and Yu, Zhiding and Philion, Jonah and Anandkumar, Anima and Fidler, Sanja and Jia, Jiaya and Alvarez, Jose},
  booktitle={IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2023}
}

Acknowledgement

We would like to thank the authors of DETR3D, MUTR3D, UVTR, PETR, and BEVStereo for their open-source release.

NVlabs/DQTrack