- This is the official repository of the paper: ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent Queries (CVPR 2023).
conda create -n vip3d python=3.6
conda activate vip3d
pip install torch==1.10+cu111 torchvision==0.11.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install mmcv-full==1.4.0 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.10/index.html
pip install mmdet==2.24.1
cd ~
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
git checkout v0.17.1 # Other versions may not be compatible.
python setup.py install
pip install -r requirements/runtime.txt # Install packages for mmdet3d
Download nuScenes full dataset (v1.0) and map expansion (v1.3) here.
Only need to download Keyframe blobs and Radar blobs.
After downloading, the structure is as follows:
ViP3D
├── mmdet3d/
├── plugin/
├── tools/
├── data/
│ ├── nuscenes/
│ │ ├── maps/
│ │ ├── samples/
│ │ ├── v1.0-trainval/
│ │ ├── lidarseg/
Suppose data is saved at data/nuscenes/
.
python tools/data_converter/nusc_tracking.py
Train ViP3D using 3 historical frames and the ResNet50 backbone. It will load a pre-trained detector for weight initialization. Suppose the detector is at ckpts/detr3d_resnet50.pth
. It can be downloaded from here.
bash tools/dist_train.sh plugin/configs/vip3d_resnet50_3frame.py 8 --work-dir=work_dirs/vip3d_resnet50_3frame.1
The training stage requires ~ 17 GB GPU memory, and takes ~ 3 days for 24 epochs on 8× 3090 GPUS.
Run evaluation using the following command:
PYTHONPATH=. python tools/test.py plugin/vip3d/configs/vip3d_resnet50_3frame.py work_dirs/vip3d_resnet50_3frame.1/epoch_24.pth --eval bbox
Expected AMOTA using ResNet50 as backbone: 0.291
Then test prediction metrics:
unzip ./nuscenes_prediction_infos_val.zip
python tools/prediction_eval.py --result_path 'work_dirs/vip3d_resnet50_3frame.1/results_nusc.json'
Expected results: minADE: 1.47, minFDE: 2.21, MR: 0.237, EPA: 0.245
The code and assets are under the Apache 2.0 license.
If you find our work useful for your research, please consider citing the paper:
@article{vip3d,
title={Vip3d: End-to-end visual trajectory prediction via 3d agent queries},
author={Gu, Junru and Hu, Chenxu and Zhang, Tianyuan and Chen, Xuanyao and Wang, Yilun and Wang, Yue and Zhao, Hang},
journal={arXiv preprint arXiv:2208.01582},
year={2022}
}