/Ego3RT

[ECCV 2022] Learning Ego 3D Representation as Ray Tracing

Primary LanguagePythonMIT LicenseMIT

Learning Ego 3D Representation as Ray Tracing

Learning Ego 3D Representation as Ray Tracing,
Jiachen Lu, Zheyuan Zhou, Xiatian Zhu, Hang Xu, Li Zhang
ECCV 2022

Demo

Video

IMAGE ALT TEXT HERE

News

  • [2022/07/19]: Configs and instructions for training are released!
  • [2022/07/05]: First version of Ego3RT is released! Code for detection head and training configs will comming soon.
  • [2022/07/04]: Ego3RT is accepted by ECCV 2022!

Abstract

A self-driving perception model aims to extract 3D semantic representations from multiple cameras collectively into the bird's-eye-view (BEV) coordinate frame of the ego car in order to ground downstream planner. Existing perception methods often rely on error-prone depth estimation of the whole scene or learning sparse virtual 3D representations without the target geometry structure, both of which remain limited in performance and/or capability. In this paper, we present a novel end-to-end architecture for ego 3D representation learning from an arbitrary number of unconstrained camera views. Inspired by the ray tracing principle, we design a polarized grid of ``imaginary eyes" as the learnable ego 3D representation and formulate the learning process with the adaptive attention mechanism in conjunction with the 3D-to-2D projection. Critically, this formulation allows extracting rich 3D representation from 2D images without any depth supervision, and with the built-in geometry structure consistent w.r.t. BEV. Despite its simplicity and versatility, extensive experiments on standard BEV visual tasks (e.g., camera-based 3D object detection and BEV segmentation) show that our model outperforms all state-of-the-art alternatives significantly, with an extra advantage in computational efficiency from multi-task learning.

Methods

Train & Test

Please refer to the get_started.md

Result

3D object detection on nuScenes validation set

Model Polar size mAP NDS checkpoint
Ego3RT, ResNet101_DCN 80x256 37.5 45.0
Ego3RT, ResNet101_DCN 72x192 37.5 44.9 ego3rt_polar72x192_cart128x128.pth
Ego3RT, VoVNet 80x256 47.8 53.4

3D object detection on nuScenes test set

Model Polar size mAP NDS
Ego3RT, ResNet101_DCN 80x256 38.9 44.3
Ego3RT, VoVNet 80x256 42.5 47.3

BEV segmentation on nuScenes validation set

Model Polar size Multitask mIoU
Ego3RT, EfficientNet 80x256 no 55.5
Ego3RT, ResNet101_DCN 80x256 yes 46.2

License

MIT

Reference

@inproceedings{lu2022ego3rt,
  title={Learning Ego 3D Representation as Ray Tracing},
  author={Lu, Jiachen and Zhou, Zheyuan and Zhu, Xiatian and Xu, Hang and Zhang, Li},
  booktitle={European Conference on Computer Vision},
  year={2022}
}

Acknowledgement

Thanks to previous open-sourced repo: