Huaiyuan Xu . Junliang Chen . Shiyu Meng . Yi Wang . Lap-Pui Chau^*

We research 3D Occupancy Perception for Autonomous Driving

This work focuses on 3D dense perception in autonomous driving, encompassing LiDAR-Centric Occupancy Perception, Vision-Centric Occupancy Perception, and Multi-Modal Occupancy Perception. Information fusion techniques for this field are discussed. We believe this will be the most comprehensive survey to date on 3D Occupancy Perception. Please stay tuned!😉😉😉

This is an active repository, you can watch for following the latest advances. If you find it useful, please kindly star this repo.

✨You are welcome to provide us your work with a topic related to 3D occupancy for autonomous driving (involving not only perception, but also applications)!

If you discover any missing work or have any suggestions, please feel free to submit a pull request or contact us. We will promptly add the missing papers to this repository.

✨Highlight

[1] A systematically survey for the latest research on 3D occupancy perception in the field of autonomous driving.

[2] The survey provides the taxonomy of 3D occupancy perception, and elaborate on core methodological issues, including network pipelines, multi-source information fusion, and effective network training.

[3] The survey presents evaluations for 3D occupancy perception, and offers detailed performance comparisons. Furthermore, current limitations and future research directions are discussed.

🔥 News

[2024-09-03] This survey got accepted by Information Fusion (Impact factor: 14.7).
[2024-07-26] Attention! We are actively looking for highly motivated PhD students! If you are interested in joining us to research autonomous driving, please feel free to contact Professor Chau (IEEE Fellow, Global STEM Professorship, Hong Kong PolyU, QS Ranking #57).
[2024-07-21] More representative works and benchmarking comparisons have been incorporated, bringing the total to 192 literature references.
[2024-05-18] More figures have been added to the survey. We reorganize the occupancy-based applications.
[2024-05-08] The first version of the survey is available on arXiv. We curate this repository.

Introduction

3D occupancy perception technology aims to observe and understand dense 3D environments for autonomous vehicles. Owing to its comprehensive perception capability, this technology is emerging as a trend in autonomous driving perception systems, and is attracting significant attention from both industry and academia. Similar to traditional bird's-eye view (BEV) perception, 3D occupancy perception has the nature of multi-source input and the necessity for information fusion. However, the difference is that it captures vertical structures that are ignored by 2D BEV. In this survey, we review the most recent works on 3D occupancy perception, and provide in-depth analyses of methodologies with various input modalities. Specifically, we summarize general network pipelines, highlight information fusion techniques, and discuss effective network training. We evaluate and analyze the occupancy perception performance of the state-of-the-art on the most popular datasets. Furthermore, challenges and future research directions are discussed. We hope this paper will inspire the community and encourage more research work on 3D occupancy perception.

Methods: A Survey

LiDAR-Centric Occupancy Perception

Year	Venue	Paper Title	Link
2024	CVPR	PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness (Best paper award candidate)	Project Page
2024	IROS	LiDAR-based 4D Occupancy Completion and Forecasting	Project Page
2024	arXiv	MergeOcc: Bridge the Domain Gap between Different LiDARs for Robust Occupancy Prediction	-
2023	T-IV	Occupancy-MAE: Self-supervised Pre-training Large-scale LiDAR Point Clouds with Masked Occupancy Autoencoders	Code
2023	arXiv	PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction	Code
2021	T-PAMI	Semantic Scene Completion using Local Deep Implicit Functions on LiDAR Data	-
2021	AAAI	Sparse Single Sweep LiDAR Point Cloud Segmentation via Learning Contextual Shape Priors from Scene Completion	Code
2020	CoRL	S3CNet: A Sparse Semantic Scene Completion Network for LiDAR Point Clouds	-
2020	3DV	LMSCNet: Lightweight Multiscale 3D Semantic Completion	Code

Vision-Centric Occupancy Perception

Year	Venue	Paper Title	Link
2024	ECCV	VEON: Vocabulary-Enhanced Occupancy Prediction	Code
2024	ECCV	Fully Sparse 3D Occupancy Prediction	Code
2024	ECCV	GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction	Project Page
2024	ECCV	Occupancy as Set of Points	Code
2024	ECCV	Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion	Code
2024	CVPR	LowRankOcc: Tensor Decomposition and Low-Rank Recovery for Vision-based 3D Semantic Occupancy Prediction	-
2024	CVPR	Bi-SSC: Geometric-Semantic Bidirectional Fusion for Camera-based 3D Semantic Scene Completion	-
2024	CVPR	Symphonize 3D Semantic Scene Completion with Contextual Instance Queries	Code
2024	CVPR	SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction	Project Page
2024	CVPR	SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction	Project Page
2024	CVPR	PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation	Code
2024	CVPR	Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation	Code
2024	CVPR	COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction	Code
2024	CVPR	Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles	Project Page
2024	CVPR	Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications	Code
2024	CVPR	Boosting Self-Supervision for Single-View Scene Completion via Knowledge Distillation	Project Page
2024	CVPR	DriveWorld: 4D Pre-trained Scene Understanding viaWorld Models for Autonomous Driving	-
2024	IJCAI	Label-efficient Semantic Scene Completion with Scribble Annotations	Code
2024	IJCAI	Bridging Stereo Geometry and BEV Representation with Reliable Mutual Interaction for Semantic Scene Completion	Code
2024	ICRA	The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition	Project Page
2024	ICRA	RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision	Code
2024	ICRA	MonoOcc: Digging into Monocular Semantic Occupancy Prediction	Code
2024	ICRA	FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird’s-Eye View and Perspective View	-
2024	AAAI	Regulating Intermediate 3D Features for Vision-Centric Autonomous Driving	Code
2024	AAAI	One at a Time: Progressive Multi-step Volumetric Probability Learning for Reliable 3D Scene Perception	-
2024	RA-L	HybridOcc: NeRF Enhanced Transformer-based Multi-Camera 3D Occupancy Prediction	-
2024	RA-L	UniScene: Multi-Camera Unified Pre-Training via 3D Scene Reconstruction	Code
2024	3DV	PanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving	-
2024	IROS	SSCBench: Monocular 3D Semantic Scene Completion Benchmark in Street Views	Code
2024	arXiv	OPUS: Occupancy Prediction Using a Sparse Set	Code
2024	arXiv	Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance	-
2024	arXiv	Deep Height Decoupling for Precise Vision-based 3D Occupancy Prediction	Code
2024	arXiv	AdaOcc: Adaptive-Resolution Occupancy Prediction	-
2024	arXiv	GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting	Project Page
2024	arXiv	MambaOcc: Visual State Space Model for BEV-based Occupancy Prediction with Local Adaptive Reordering	Code
2024	arXiv	VPOcc: Exploiting Vanishing Point for Monocular 3D Semantic Occupancy Prediction	-
2024	arXiv	UniVision: A Unified Framework for Vision-Centric 3D Perception	Code
2024	arXiv	LangOcc: Self-Supervised Open Vocabulary Occupancy Estimation via Volume Rendering	-
2024	arXiv	Real-Time 3D Occupancy Prediction via Geometric-Semantic Disentanglement	-
2024	arXiv	Let Occ Flow: Self-Supervised 3D Occupancy Flow Prediction	Project Page
2024	arXiv	α-SSC: Uncertainty-Aware Camera-based 3D Semantic Scene Completion	-
2024	arXiv	Panoptic-FlashOcc: An Efficient Baseline to Marry Semantic Occupancy with Panoptic via Instance Center	Code
2024	arXiv	BDC-Occ: Binarized Deep Convolution Unit For Binarized Occupancy Network	Code
2024	arXiv	Context and Geometry Aware Voxel Transformer for Semantic Scene Completion	Code
2024	arXiv	GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision	-
2024	arXiv	OccFlowNet: Towards Self-supervised Occupancy Estimation via Differentiable Rendering and Occupancy Flow	-
2024	arXiv	OccFiner: Offboard Occupancy Refinement with Hybrid Propagation	-
2024	arXiv	InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction	Code
2024	arXiv	Unified Spatio-Temporal Tri-Perspective View Representation for 3D Semantic Occupancy Prediction	Project Page
2024	arXiv	ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers	-
2023	CVPR	VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion	Code
2023	CVPR	Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction	Project Page
2023	NeurIPS	POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images	Project Page
2023	NeurIPS	Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving	Project Page
2023	ICCV	SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving	Project Page
2023	ICCV	Scene as Occupancy	Code
2023	ICCV	OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction	Code
2023	ICCV	NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space	Code
2023	T-IV	3DOPFormer: 3D Occupancy Perception from Multi-Camera Images with Directional and Distance Enhancement	Code
2023	arXiv	OccupancyDETR: Using DETR for Mixed Dense-sparse 3D Occupancy Prediction	-
2023	arXiv	SOccDPT: Semi-Supervised 3D Semantic Occupancy from Dense Prediction Transformers trained under memory constraints	-
2023	arXiv	OVO: Open-Vocabulary Occupancy	Code
2023	arXiv	OctreeOcc: Efficient and Multi-Granularity Occupancy Prediction Using Octree Queries	Code
2023	arXiv	OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments	Project Page
2023	arXiv	OccDepth: A Depth-Aware Method for 3D Semantic Scene Completion	Code
2023	arXiv	FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin	Code
2023	arXiv	FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation	Code
2023	arXiv	DepthSSC: Depth-Spatial Alignment and Dynamic Voxel Resolution for Monocular 3D Semantic Scene Completion	-
2023	arXiv	Camera-based 3D Semantic Scene Completion with Sparse Guidance Network	Code
2023	arXiv	A Simple Framework for 3D Occupancy Estimation in Autonomous Driving	Code
2023	arXiv	UniWorld: Autonomous Driving Pre-training via World Models	Code
2022	CVPR	MonoScene: Monocular 3D Semantic Scene Completion	Project Page

Radar-Centric Occupancy Perception

Year	Venue	Paper Title	Link
2024	arXiv	RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar	-

Multi-Modal Occupancy Perception

Year	Venue	Paper Title	Code
2024	ECCV	OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving	Project Page
2024	RA-L	Co-Occ: Coupling Explicit Feature Fusion with Volume Rendering Regularization for Multi-Modal 3D Semantic Occupancy Prediction	Project Page
2024	arXiv	OccMamba: Semantic Occupancy Prediction with State Space Models	-
2024	arXiv	LiCROcc: Teach Radar for Accurate Semantic Occupancy Prediction using LiDAR and Camera	Project Page
2024	arXiv	OccFusion: Depth Estimation Free Multi-sensor Fusion for 3D Occupancy Prediction	-
2024	arXiv	EFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy Network	Code
2024	arXiv	Real-time 3D semantic occupancy prediction for autonomous vehicles using memory-efficient sparse convolution	-
2024	arXiv	OccFusion: A Straightforward and Effective Multi-Sensor Fusion Framework for 3D Occupancy Prediction	-
2024	arXiv	Unleashing HyDRa: Hybrid Fusion, Depth Consistency and Radar for Unified 3D Perception	-
2023	ICCV	OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception	Code

3D Occupancy Datasets

Dataset	Year	Venue	Modality	# of Classes	Flow	Link
OpenScene	2024	CVPR 2024 Challenge	Camera	-	✔️	Intro.
Cam4DOcc	2024	CVPR	Camera+LiDAR	2	✔️	Intro.
Occ3D	2024	NeurIPS	Camera	14 (Occ3D-Waymo), 16 (Occ3D-nuScenes)	❌	Intro.
OpenOcc	2023	ICCV	Camera	16	❌	Intro.
OpenOccupancy	2023	ICCV	Camera+LiDAR	16	❌	Intro.
SurroundOcc	2023	ICCV	Camera	16	❌	Intro.
OCFBench	2023	arXiv	LiDAR	-(OCFBench-Lyft), 17(OCFBench-Argoverse), 25(OCFBench-ApolloScape), 16(OCFBench-nuScenes)	❌	Intro.
SSCBench	2023	arXiv	Camera	19(SSCBench-KITTI-360), 16(SSCBench-nuScenes), 14(SSCBench-Waymo)	❌	Intro.
SemanticKITT	2019	ICCV	Camera+LiDAR	19(Semantic Scene Completion task)	❌	Intro.

Occupancy-based Applications

Segmentation

Specific Task	Year	Venue	Paper Title	Link
3D Panoptic Segmentation	2024	CVPR	PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation	Code
BEV Segmentation	2024	CVPRW	OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks	Code

Detection

Specific Task	Year	Venue	Paper Title	Link
3D Object Detection	2024	CVPR	Learning Occupancy for Monocular 3D Object Detection	Code
3D Object Detection	2024	AAAI	SOGDet: Semantic-Occupancy Guided Multi-view 3D Object Detection	Code
3D Object Detection	2024	arXiv	UltimateDO: An Efficient Framework to Marry Occupancy Prediction with 3D Object Detection via Channel2height	-

Dynamic Perception

Specific Task	Year	Venue	Paper Title	Link
3D Flow Prediction	2024	CVPR	Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications	Code
3D Flow Prediction	2024	arXiv	Let Occ Flow: Self-Supervised 3D Occupancy Flow Prediction	Project Page

Generation

Specific Task	Year	Venue	Paper Title	Link
Scene Generation	2024	ECCV	Pyramid Diffusion for Fine 3D Large Scene Generation (Oral paper)	Code
Scene Generation	2024	CVPR	SemCity: Semantic Scene Generation with Triplane Diffusion	Code

World Models

Specific Task	Year	Venue	Paper Title	Link
4D Occupancy Forecasting	2024	ECCV	OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving	Project Page
4D Occupancy Forecasting	2024	CVPR	UnO: Unsupervised Occupancy Fields for Perception and Forecasting (Oral paper)	Project Page
4D Representation Learning Framework	2024	CVPR	DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving	-
4D Occupancy Forecasting	2024	CVPR	Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications	Code
4D Occupancy Forecasting	2024	AAAI	Semantic Complete Scene Forecasting from a 4D Dynamic Point Cloud Sequence	Project Page
4D Occupancy Forecasting and Motion Planing	2024	arXiv	RenderWorld: World Model with Self-Supervised 3D Label	-
4D Occupancy Forecasting, Motion Planing, and Reasoning	2024	arXiv	OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving	-
4D Occupancy Forecasting and Generation	2024	arXiv	Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving	-
4D Occupancy Generation	2024	arXiv	OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving	Project Page
4D Occupancy Forecasting	2023	CVPR	Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting	Project Page

Unified Autonomous Driving Algorithm Framework

Specific Tasks	Year	Venue	Paper Title	Link
Occupancy Prediction, 3D Object Detection, Online Mapping, Multi-object Tracking, Motion Prediction, Motion Planning	2024	CVPR	DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving	-
Occupancy Prediction, 3D Object Detection	2024	RA-L	UniScene: Multi-Camera Unified Pre-training via 3D Scene Reconstruction for Autonomous Driving	Code
Occupancy Forecasting, Motion Planning	2024	arXiv	Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving	-
Occupancy Prediction, 3D Object Detection, BEV segmentation, Motion Planning	2023	ICCV	Scene as Occupancy	Code

Cite The Survey

If you find our survey and repository useful for your research project, please consider citing our paper:

@misc{xu2024survey,
      title={A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective}, 
      author={Huaiyuan Xu and Junliang Chen and Shiyu Meng and Yi Wang and Lap-Pui Chau},
      year={2024},
      eprint={2405.05173},
      archivePrefix={arXiv}
}

Contact

If you have any questions, please feel free to get in touch:

lap-pui.chau@polyu.edu.hk
huaiyuan.xu@polyu.edu.hk

If you are interested in joining us as a Ph.D. student to research computer vision, machine learning, please feel free to contact Professor Chau:

lap-pui.chau@polyu.edu.hk

HuaiyuanXu/3D-Occupancy-Perception