Rethinking the Open-Loop Evaluation of End-to-End Autonomous Driving in nuScenes

This repository is an official implementation of the technical report AD-MLP.

Jiang-Tian Zhai*, Feng Ze*, Jinhao Du*, Yongqiang Mao*, Jiang-Jiang Liu†, Zichang Tan, Yifu Zhang, Xiaoqing Ye, Jingdong Wang†

Baidu Inc.

*: equal contribution, ^†: corresponding author.

News

2023.10.20: Update: we would like to thank @PointsCoder (related issue #4) for finding out the mistake in the data we used for training. We fixed the problem and updated the results. Under open-loop evaluation, the average L2 error: 0.23 -> 0.29, and the collision rate: 0.12 -> 0.19.
2023.05.18: Paper is released on arxiv!
2023.05.17: Code / Models are released!

Introduction

Interesting Tips!

We use a simple MLP-based approach that takes raw sensor data as input and outputs the future trajectory planning of the ego vehicle, without using any perception or prediction information such as camera images or LiDAR. This simple method achieves comparable end-to-end planning performance on the nuScenes Dataset to the SoTA methods, reducing the average L2 error by about 20% on the nuScenes open-loop evaluation metrics.
The primary objective of this work is to present our observations rather than to propose a new method.
Our findings demonstrate the potential limitations of the current evaluation scheme on the nuScenes dataset.
Although our model performs well within the confines of the nuScenes Dataset, we acknowledge that it is merely an impractical toy incapable of functioning in real-world scenarios. Driving without any perception surrounding knowledge beyond the ego vehicle’s states is an insurmountable challenge.
We hope our findings will stimulate further research in the field, encouraging a re-evaluation and enhancement of the planning task for end-to-end autonomous driving.

Results

Open-loop planning results on nuScenes.

Method	L2 (m) 1s $\downarrow$	L2 (m) 2s $\downarrow$	L2 (m) 3s $\downarrow$	Avg L2 (m)	Col. (%) 1s $\downarrow$	Col. (%) 2s $\downarrow$	Col. (%) 3s $\downarrow$	Avg Col. (%)
ST-P3	1.33	2.11	2.90	2.11	0.23	0.62	1.27	0.71
UniAD	0.48	0.96	1.65	1.03	0.05	0.17	0.71	0.31
VAD-Tiny	0.20	0.38	0.65	0.41	0.10	0.12	0.27	0.16
VAD-Base	0.17	0.34	0.60	0.37	0.07	0.10	0.24	0.14
Ours	0.20	0.26	0.41	0.29	0.17	0.18	0.24	0.19

Get Started

Environment Linux, Python==3.7.9, CUDA == 11.2, pytorch == 1.9.1 or paddlepaddle == 2.3.2. Besides, follow instruction in ST-P3 for running its evaluation process.
```
cd deps/stp3
conda env create -f environment.yml
```
Prepare Data
Download the nuScenes Dataset.
Pretrained weights
To verify the performance on the nuScenes Dataset, we provide the pretrained model weights (Google Drive and Baidu Netdisk). Please download them (paddle checkpoint, token of validation set...) to the root directory of this project.
Paddle Evaluation
```
python paddle/model/AD-MLP.py
python deps/stp3/evaluate_for_mlp.py
```
The first line saves the predicted 6 frames' trajectories of the next 3s in output_data.pkl. And the second line applies the ST-P3 evaluation on it. The final evaluation output contains the L2 error and collision rate in the next 1, 2 and 3s.

Two versions of evaluation metrics are provided: online and offline. The offline version uses pre-stored ground truth and is far faster than online one. The code defaults to offline.

Training: We upload the training code in pytorch/admlp folder. Additional files required for training is in Baidu Netdisk. Please arrange pkl files like this:

pytorch
├── admlp
│   ├── fengze_nuscenes_infos_val.pkl
│   ├── fengze_nuscenes_infos_train.pkl
│   ├── stp3_val
│   │   ├── data_nuscene.pkl
│   │   ├── filter_token.pkl
│   │   ├── stp3_occupancy.pkl
│   │   ├── stp3_traj_gt.pkl

Start the training process with train.py under the folder.

cd pytorch/admlp
python train.py

Pytorch evaluation: We upload the trained weights in pytorch/admlp/mlp.pth to reproduce the results in our technique report. Start the evaluation process with eval_weight.py under the folder.
```
cd pytorch/admlp
python eval_weight.py
```
Collision rate evaluation: We have observed that the evaluation of model collision rates is sensitive to certain samples. One typical example is when the ego vehicle is in a stationary state, the model often predicts trajectories for the next 3 seconds that are very close to the origin (but not exactly 0). If obstacles exist in the range of [0, 0.5m) in the x,y dimensiosn around the origin, the model's predictions of coordinates with very small absolute values can introduce unstable systematic errors due to the resolution of the occupancy map. To mitigate this issue, we recommend processing the model outputs in the deps/stp3/evaluate_for_mlp.py or at the original model inference stage. For instance, you can set the model's predicted trajectory to zero if the distance from the origin is smaller than a little threshold, e.g. 1e-2m. This approach is similar to the filtering of ground truth trajectories that collide in the original evaluation code, as both methods aim to remove systematic errors. We also suggest reviewing cases where collisions occur.

Contact

If you have any questions or suggestions about this repo, please feel free to contact us (jtzhai30@gmail.com, j04.liu@gmail.com, yxq@whu.edu.cn, wangjingdong@outlook.com).

Acknowledgement

This repo is build based on ST-P3. Thanks for their great work.

License

All code in this repository is under the Apache License 2.0.

BibTeX

If you find our work and this repository useful. Please consider giving a star and citation.

@article{zhai2023ADMLP,
  title={Rethinking the Open-Loop Evaluation of End-to-End Autonomous Driving in nuScenes},
  author={Zhai, Jiang-Tian and Feng, Ze and Du, Jihao and Mao, Yongqiang and Liu, Jiang-Jiang and Tan, Zichang and Zhang, Yifu and Ye, Xiaoqing and Wang, Jingdong},
  journal={arXiv preprint arXiv:2305.10430},
  year={2023}
}

E2E-AD/AD-MLP