/PanoOcc

[CVPR 2024] PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

PanoOcc

PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation [paper]

News

  • [2024/2/27] PanoOcc is accepted by CVPR 2024
  • [2023/6/29] Code initialization, support occupancy prediction
  • [2023/6/16] We release the arXiv version (Paper in arXiv)

Catalog

  • Occupancy Flow
  • Panoptic Refine
  • 3D Panoptic Segmentation (Waymo)
  • Sparse Decoder
  • 3D Panoptic Segmentation (nuScenes)
  • Occupancy Prediction (Occ3D-nuScenes)
  • Initialize

Introduction

Comprehensive modeling of the surrounding 3D world is key to the success of autonomous driving. However, existing perception tasks like object detection, road structure segmentation, depth & elevation estimation, and open-set object localization each only focus on a small facet of the holistic 3D scene understanding task. This divide-and-conquer strategy simplifies the algorithm development procedure at the cost of losing an end-to-end unified solution to the problem. In this work, we address this limitation by studying camera-based 3D panoptic segmentation, aiming to achieve a unified occupancy representation for camera-only 3D scene understanding. To achieve this, we introduce a novel method called PanoOcc, which utilizes voxel queries to aggregate spatiotemporal information from multi-frame and multi-view images in a coarse-to-fine scheme, integrating feature learning and scene representation into a unified occupancy representation. We have conducted extensive ablation studies to verify the effectiveness and efficiency of the proposed method. Our approach achieves new state-of-the-art results for camera-based semantic segmentation and panoptic segmentation on the nuScenes dataset. Furthermore, our method can be easily extended to dense occupancy prediction and has shown promising performance on the Occ3D benchmark.

framework

Getting Started

Performance

3D occupancy prediction

Backbone Config Image Size Epochs Pretrain Memory mIoU checkpoints
R101-DCN Pano-small 0.5x 12 nus-det 14 G 36.63 model
R101-DCN Pano-base 1.0x 24 nus-det 35 G 41.60 model
R101-DCN Pano-base-pretrain 1.0x 24 nus-seg 35 G 42.13 model

3D panoptic segmentation

  • nuScenes: LiDAR Semantic Segmentation (Validation)
Backbone Config Image Size Epochs Pretrain Memory mIoU mAP NDS checkpoints
R50 Pano-small-1f 0.5x 24 ImageNet 16G 0.667 0.295 0.348 model
R50 Pano-small-4f 0.5x 24 ImageNet 18G 0.682 0.331 0.421 model
R101 Pano-base-4f 1.0x 24 nus-det 24G 0.712 0.411 0.497 model
Intern-XL Pano-large-4f 1.0x 24 nus-det-pretrain 35G 0.740 0.477 0.551 model
  • nuScenes: LiDAR Semantic Segmentation (Test)
Backbone Config Image Size Epochs Pretrain mIoU
R101 Pano-base-4f 1.0x 24 nus-det 0.714
R101 Pano-xl-4f 1.0x 24 nus-det 0.737

Bibtex

If this work is helpful for your research, please consider citing the following BibTeX entry.

@inproceedings{wang2024panoocc,
  title={Panoocc: Unified occupancy representation for camera-based 3d panoptic segmentation},
  author={Wang, Yuqi and Chen, Yuntao and Liao, Xingyu and Fan, Lue and Zhang, Zhaoxiang},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  pages={17158--17168},
  year={2024}
}

Acknowledgement

Many thanks to the following open-source projects: