/3D-Occupancy-Perception

[Information Fusion 2024] A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective

image

Huaiyuan Xu . Junliang Chen . Shiyu Meng . Yi Wang . Lap-Pui Chau*

arXiv PDF

We research 3D Occupancy Perception for Autonomous Driving

This work focuses on 3D dense perception in autonomous driving, encompassing LiDAR-Centric Occupancy Perception, Vision-Centric Occupancy Perception, and Multi-Modal Occupancy Perception. Information fusion techniques for this field are discussed. We believe this will be the most comprehensive survey to date on 3D Occupancy Perception. Please stay tuned!😉😉😉

This is an active repository, you can watch for following the latest advances. If you find it useful, please kindly star this repo.

✨You are welcome to provide us your work with a topic related to 3D occupancy for autonomous driving (involving not only perception, but also applications)!

If you discover any missing work or have any suggestions, please feel free to submit a pull request or contact us. We will promptly add the missing papers to this repository.

✨Highlight

[1] A systematically survey for the latest research on 3D occupancy perception in the field of autonomous driving.

[2] The survey provides the taxonomy of 3D occupancy perception, and elaborate on core methodological issues, including network pipelines, multi-source information fusion, and effective network training.

[3] The survey presents evaluations for 3D occupancy perception, and offers detailed performance comparisons. Furthermore, current limitations and future research directions are discussed.

🔥 News

  • [2024-09-03] This survey got accepted by Information Fusion (Impact factor: 14.7).
  • [2024-07-26] Attention! We are actively looking for highly motivated PhD students! If you are interested in joining us to research autonomous driving, please feel free to contact Professor Chau (IEEE Fellow, Global STEM Professorship, Hong Kong PolyU, QS Ranking #57).
  • [2024-07-21] More representative works and benchmarking comparisons have been incorporated, bringing the total to 192 literature references.
  • [2024-05-18] More figures have been added to the survey. We reorganize the occupancy-based applications.
  • [2024-05-08] The first version of the survey is available on arXiv. We curate this repository.

Introduction

3D occupancy perception technology aims to observe and understand dense 3D environments for autonomous vehicles. Owing to its comprehensive perception capability, this technology is emerging as a trend in autonomous driving perception systems, and is attracting significant attention from both industry and academia. Similar to traditional bird's-eye view (BEV) perception, 3D occupancy perception has the nature of multi-source input and the necessity for information fusion. However, the difference is that it captures vertical structures that are ignored by 2D BEV. In this survey, we review the most recent works on 3D occupancy perception, and provide in-depth analyses of methodologies with various input modalities. Specifically, we summarize general network pipelines, highlight information fusion techniques, and discuss effective network training. We evaluate and analyze the occupancy perception performance of the state-of-the-art on the most popular datasets. Furthermore, challenges and future research directions are discussed. We hope this paper will inspire the community and encourage more research work on 3D occupancy perception.

Summary of Contents

Methods: A Survey

LiDAR-Centric Occupancy Perception

Year Venue Paper Title Link
2024 CVPR PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness (Best paper award candidate) Project Page
2024 IROS LiDAR-based 4D Occupancy Completion and Forecasting Project Page
2024 arXiv MergeOcc: Bridge the Domain Gap between Different LiDARs for Robust Occupancy Prediction -
2023 T-IV Occupancy-MAE: Self-supervised Pre-training Large-scale LiDAR Point Clouds with Masked Occupancy Autoencoders Code
2023 arXiv PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction Code
2021 T-PAMI Semantic Scene Completion using Local Deep Implicit Functions on LiDAR Data -
2021 AAAI Sparse Single Sweep LiDAR Point Cloud Segmentation via Learning Contextual Shape Priors from Scene Completion Code
2020 CoRL S3CNet: A Sparse Semantic Scene Completion Network for LiDAR Point Clouds -
2020 3DV LMSCNet: Lightweight Multiscale 3D Semantic Completion Code

Vision-Centric Occupancy Perception

Year Venue Paper Title Link
2024 ECCV VEON: Vocabulary-Enhanced Occupancy Prediction Code
2024 ECCV Fully Sparse 3D Occupancy Prediction Code
2024 ECCV GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction Project Page
2024 ECCV Occupancy as Set of Points Code
2024 ECCV Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion Code
2024 CVPR LowRankOcc: Tensor Decomposition and Low-Rank Recovery for Vision-based 3D Semantic Occupancy Prediction -
2024 CVPR Bi-SSC: Geometric-Semantic Bidirectional Fusion for Camera-based 3D Semantic Scene Completion -
2024 CVPR Symphonize 3D Semantic Scene Completion with Contextual Instance Queries Code
2024 CVPR SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction Project Page
2024 CVPR SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction Project Page
2024 CVPR PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation Code
2024 CVPR Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation Code
2024 CVPR COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction Code
2024 CVPR Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles Project Page
2024 CVPR Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications Code
2024 CVPR Boosting Self-Supervision for Single-View Scene Completion via Knowledge Distillation Project Page
2024 CVPR DriveWorld: 4D Pre-trained Scene Understanding viaWorld Models for Autonomous Driving -
2024 IJCAI Label-efficient Semantic Scene Completion with Scribble Annotations Code
2024 IJCAI Bridging Stereo Geometry and BEV Representation with Reliable Mutual Interaction for Semantic Scene Completion Code
2024 ICRA The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition Project Page
2024 ICRA RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision Code
2024 ICRA MonoOcc: Digging into Monocular Semantic Occupancy Prediction Code
2024 ICRA FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird’s-Eye View and Perspective View -
2024 AAAI Regulating Intermediate 3D Features for Vision-Centric Autonomous Driving Code
2024 AAAI One at a Time: Progressive Multi-step Volumetric Probability Learning for Reliable 3D Scene Perception -
2024 RA-L HybridOcc: NeRF Enhanced Transformer-based Multi-Camera 3D Occupancy Prediction -
2024 RA-L UniScene: Multi-Camera Unified Pre-Training via 3D Scene Reconstruction Code
2024 3DV PanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving -
2024 IROS SSCBench: Monocular 3D Semantic Scene Completion Benchmark in Street Views Code
2024 arXiv OPUS: Occupancy Prediction Using a Sparse Set Code
2024 arXiv Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance -
2024 arXiv Deep Height Decoupling for Precise Vision-based 3D Occupancy Prediction Code
2024 arXiv AdaOcc: Adaptive-Resolution Occupancy Prediction -
2024 arXiv GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting Project Page
2024 arXiv MambaOcc: Visual State Space Model for BEV-based Occupancy Prediction with Local Adaptive Reordering Code
2024 arXiv VPOcc: Exploiting Vanishing Point for Monocular 3D Semantic Occupancy Prediction -
2024 arXiv UniVision: A Unified Framework for Vision-Centric 3D Perception Code
2024 arXiv LangOcc: Self-Supervised Open Vocabulary Occupancy Estimation via Volume Rendering -
2024 arXiv Real-Time 3D Occupancy Prediction via Geometric-Semantic Disentanglement -
2024 arXiv Let Occ Flow: Self-Supervised 3D Occupancy Flow Prediction Project Page
2024 arXiv α-SSC: Uncertainty-Aware Camera-based 3D Semantic Scene Completion -
2024 arXiv Panoptic-FlashOcc: An Efficient Baseline to Marry Semantic Occupancy with Panoptic via Instance Center Code
2024 arXiv BDC-Occ: Binarized Deep Convolution Unit For Binarized Occupancy Network Code
2024 arXiv Context and Geometry Aware Voxel Transformer for Semantic Scene Completion Code
2024 arXiv GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision -
2024 arXiv OccFlowNet: Towards Self-supervised Occupancy Estimation via Differentiable Rendering and Occupancy Flow -
2024 arXiv OccFiner: Offboard Occupancy Refinement with Hybrid Propagation -
2024 arXiv InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction Code
2024 arXiv Unified Spatio-Temporal Tri-Perspective View Representation for 3D Semantic Occupancy Prediction Project Page
2024 arXiv ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers -
2023 CVPR VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion Code
2023 CVPR Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction Project Page
2023 NeurIPS POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images Project Page
2023 NeurIPS Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving Project Page
2023 ICCV SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving Project Page
2023 ICCV Scene as Occupancy Code
2023 ICCV OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction Code
2023 ICCV NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space Code
2023 T-IV 3DOPFormer: 3D Occupancy Perception from Multi-Camera Images with Directional and Distance Enhancement Code
2023 arXiv OccupancyDETR: Using DETR for Mixed Dense-sparse 3D Occupancy Prediction -
2023 arXiv SOccDPT: Semi-Supervised 3D Semantic Occupancy from Dense Prediction Transformers trained under memory constraints -
2023 arXiv OVO: Open-Vocabulary Occupancy Code
2023 arXiv OctreeOcc: Efficient and Multi-Granularity Occupancy Prediction Using Octree Queries Code
2023 arXiv OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments Project Page
2023 arXiv OccDepth: A Depth-Aware Method for 3D Semantic Scene Completion Code
2023 arXiv FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin Code
2023 arXiv FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation Code
2023 arXiv DepthSSC: Depth-Spatial Alignment and Dynamic Voxel Resolution for Monocular 3D Semantic Scene Completion -
2023 arXiv Camera-based 3D Semantic Scene Completion with Sparse Guidance Network Code
2023 arXiv A Simple Framework for 3D Occupancy Estimation in Autonomous Driving Code
2023 arXiv UniWorld: Autonomous Driving Pre-training via World Models Code
2022 CVPR MonoScene: Monocular 3D Semantic Scene Completion Project Page

Radar-Centric Occupancy Perception

Year Venue Paper Title Link
2024 arXiv RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar -

Multi-Modal Occupancy Perception

Year Venue Paper Title Code
2024 ECCV OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving Project Page
2024 RA-L Co-Occ: Coupling Explicit Feature Fusion with Volume Rendering Regularization for Multi-Modal 3D Semantic Occupancy Prediction Project Page
2024 arXiv OccMamba: Semantic Occupancy Prediction with State Space Models -
2024 arXiv LiCROcc: Teach Radar for Accurate Semantic Occupancy Prediction using LiDAR and Camera Project Page
2024 arXiv OccFusion: Depth Estimation Free Multi-sensor Fusion for 3D Occupancy Prediction -
2024 arXiv EFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy Network Code
2024 arXiv Real-time 3D semantic occupancy prediction for autonomous vehicles using memory-efficient sparse convolution -
2024 arXiv OccFusion: A Straightforward and Effective Multi-Sensor Fusion Framework for 3D Occupancy Prediction -
2024 arXiv Unleashing HyDRa: Hybrid Fusion, Depth Consistency and Radar for Unified 3D Perception -
2023 ICCV OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception Code

3D Occupancy Datasets

Dataset Year Venue Modality # of Classes Flow Link
OpenScene 2024 CVPR 2024 Challenge Camera - ✔️ Intro.
Cam4DOcc 2024 CVPR Camera+LiDAR 2 ✔️ Intro.
Occ3D 2024 NeurIPS Camera 14 (Occ3D-Waymo), 16 (Occ3D-nuScenes) Intro.
OpenOcc 2023 ICCV Camera 16 Intro.
OpenOccupancy 2023 ICCV Camera+LiDAR 16 Intro.
SurroundOcc 2023 ICCV Camera 16 Intro.
OCFBench 2023 arXiv LiDAR -(OCFBench-Lyft), 17(OCFBench-Argoverse), 25(OCFBench-ApolloScape), 16(OCFBench-nuScenes) Intro.
SSCBench 2023 arXiv Camera 19(SSCBench-KITTI-360), 16(SSCBench-nuScenes), 14(SSCBench-Waymo) Intro.
SemanticKITT 2019 ICCV Camera+LiDAR 19(Semantic Scene Completion task) Intro.

Occupancy-based Applications

Segmentation

Specific Task Year Venue Paper Title Link
3D Panoptic Segmentation 2024 CVPR PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation Code
BEV Segmentation 2024 CVPRW OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks Code

Detection

Specific Task Year Venue Paper Title Link
3D Object Detection 2024 CVPR Learning Occupancy for Monocular 3D Object Detection Code
3D Object Detection 2024 AAAI SOGDet: Semantic-Occupancy Guided Multi-view 3D Object Detection Code
3D Object Detection 2024 arXiv UltimateDO: An Efficient Framework to Marry Occupancy Prediction with 3D Object Detection via Channel2height -

Dynamic Perception

Specific Task Year Venue Paper Title Link
3D Flow Prediction 2024 CVPR Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications Code
3D Flow Prediction 2024 arXiv Let Occ Flow: Self-Supervised 3D Occupancy Flow Prediction Project Page

Generation

Specific Task Year Venue Paper Title Link
Scene Generation 2024 ECCV Pyramid Diffusion for Fine 3D Large Scene Generation (Oral paper) Code
Scene Generation 2024 CVPR SemCity: Semantic Scene Generation with Triplane Diffusion Code

World Models

Specific Task Year Venue Paper Title Link
4D Occupancy Forecasting 2024 ECCV OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving Project Page
4D Occupancy Forecasting 2024 CVPR UnO: Unsupervised Occupancy Fields for Perception and Forecasting (Oral paper) Project Page
4D Representation Learning Framework 2024 CVPR DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving -
4D Occupancy Forecasting 2024 CVPR Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications Code
4D Occupancy Forecasting 2024 AAAI Semantic Complete Scene Forecasting from a 4D Dynamic Point Cloud Sequence Project Page
4D Occupancy Forecasting and Motion Planing 2024 arXiv RenderWorld: World Model with Self-Supervised 3D Label -
4D Occupancy Forecasting, Motion Planing, and Reasoning 2024 arXiv OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving -
4D Occupancy Forecasting and Generation 2024 arXiv Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving -
4D Occupancy Generation 2024 arXiv OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving Project Page
4D Occupancy Forecasting 2023 CVPR Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting Project Page

Unified Autonomous Driving Algorithm Framework

Specific Tasks Year Venue Paper Title Link
Occupancy Prediction, 3D Object Detection, Online Mapping, Multi-object Tracking, Motion Prediction, Motion Planning 2024 CVPR DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving -
Occupancy Prediction, 3D Object Detection 2024 RA-L UniScene: Multi-Camera Unified Pre-training via 3D Scene Reconstruction for Autonomous Driving Code
Occupancy Forecasting, Motion Planning 2024 arXiv Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving -
Occupancy Prediction, 3D Object Detection, BEV segmentation, Motion Planning 2023 ICCV Scene as Occupancy Code

Cite The Survey

If you find our survey and repository useful for your research project, please consider citing our paper:

@misc{xu2024survey,
      title={A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective}, 
      author={Huaiyuan Xu and Junliang Chen and Shiyu Meng and Yi Wang and Lap-Pui Chau},
      year={2024},
      eprint={2405.05173},
      archivePrefix={arXiv}
}

Contact

If you have any questions, please feel free to get in touch:

lap-pui.chau@polyu.edu.hk
huaiyuan.xu@polyu.edu.hk

If you are interested in joining us as a Ph.D. student to research computer vision, machine learning, please feel free to contact Professor Chau:

lap-pui.chau@polyu.edu.hk