😎 Awesome 3D and 4D World Models

This survey reviews state-of-the-art 3D and 4D world models - systems that learn, predict, and simulate the geometry and dynamics of real environments from multi-modal signals.

We unify terminology, scope, and evaluations, and organize the space into three complementary paradigms by representation:


	Learn generative or predictive models from sequential video streams with geometric and temporal constraints. VideoGen focuses on long-horizon consistency, controllability, and scene-level generation, enabling agents to imagine or forecast plausible video rollouts.
	Model 3D/4D occupancy grids that encode geometry and semantics in voxel space. OccGen provides a physics-consistent scaffold for robust perception, forecasting, and simulation, bridging low-level sensor data and high-level reasoning.
	Leverage point cloud sequences from LiDAR sensors to generate or predict geometry-grounded scenes. LiDARGen emphasizes high-fidelity 3D structure, robustness to environment changes, and applications in safety-critical domains such as autonomous driving.

For more details, kindly refer to our paper and project page. 🚀

📚 Citation

If you find this work helpful for your research, please kindly consider citing our paper:

@article{survey_3d_4d_world_models,
    title   = {3D and 4D World Modeling: A Survey},
    author  = {Lingdong Kong and Wesley Yang and Jianbiao Mei and Youquan Liu and Ao Liang and Dekai Zhu and Dongyue Lu and Wei Yin and Xiaotao Hu and Mingkai Jia and Junyuan Deng and Kaiwen Zhang and Yang Wu and Tianyi Yan and Shenyuan Gao and Song Wang and Linfeng Li and Liang Pan and Yong Liu and Jianke Zhu and Wei Tsang Ooi and Steven C. H. Hoi and Ziwei Liu},
    journal = {arXiv preprint arXiv:2509.07996},
    year    = {2025},
}

0. Background
1. Benchmarks & Datasets
- Benchmarks
- Datasets
2. World Modeling from Video Generation
3. World Modeling from Occupancy Generation
4. World Modeling from LiDAR Generation
5. Applications
6. Other Resources
7. Acknowledgements

Background


	World modeling has become a cornerstone of modern AI, enabling agents to understand, represent, and predict dynamic environments. While prior research has focused primarily on 2D images and videos, the rapid emergence of native 3D and 4D representations (e.g., RGB-D, occupancy grids, LiDAR point clouds) calls for a dedicated study.

What Are Native 3D Representations?

Unlike 2D projections, native 3D/4D signals directly encode metric geometry, visibility, and motion in the physical coordinates where agents act. Examples include:

RGB-D imagery (2D images with depth channels)
Occupancy grids (voxelized maps of free vs. occupied space)
LiDAR point clouds (3D coordinates from active sensing)
Neural fields (e.g., NeRF, Gaussian Splatting)

What Are World Models in 3D and 4D?

A 3D/4D world model is an internal representation that allows an agent to imagine, forecast, and interact with its environment in the 3D space.


	Generative World Models: synthesize plausible 3D/4D worlds under conditions (e.g., text prompts, trajectories).
	Predictive World Models: anticipate the future evolution of 3D/4D scenes given past observations and actions.

Together, these models provide the foundation for simulation, planning, and embodied intelligence in complex environments.

1. Benchmarks & Datasets

Benchmarks


WorldBench	VBench	WorldScore

Datasets

Model	Paper	Venue

`KITTI`	Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite	CVPR 2012
`NYUv2`	Indoor Segmentation and Support Inference from RGBD Images	ECCV 2012
`CARLA`	CARLA: An Open Urban Driving Simulator	CoRL 2017
`SemanticKITTI`	SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences	ICCV 2019
`nuScenes`	nuScenes: A Multimodal Dataset for Autonomous Driving	CVPR 2020
`Waymo Open`	Scalability in Perception for Autonomous Driving: Waymo Open Dataset	CVPR 2020
`STF`	Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather	CVPR 2020
`Virtual KITTI 2`	Virtual KITTI 2	arXiv 2020
`Argoverse 2`	Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting	NeurIPS 2021
`Lyft-Level5`	One Thousand and One Hours: Self-Driving Motion Prediction Dataset	CoRL 2021
`nuPlan`	nuPlan: A Closed-Loop ML-Based Planning Benchmark for Autonomous Vehicles	CVPRW 2021
`PandaSet`	PandaSet: Advanced Sensor Suite Dataset for Autonomous Driving	ITSC 2022
`OpenCOOD`	OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle Communication	ICRA 2022
`KITTI-360`	KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D	TPAMI 2022
`CarlaSC`	MotionSC: Data Set and Network for Real-Time Semantic Mapping in Dynamic Environments	RA-L 2022
`Robo3D`	Robo3D: Towards Robust and Reliable 3D Perception against Corruptions	ICCV 2023
`OpenOccupancy`	OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception	ICCV 2023
`Occ3D-nuScenes`	Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving	NeurIPS 2023
`OpenDV-YouTube`	GenAD: Generalized Predictive Model for Autonomous Driving	CVPR 2024
`SSCBench`	SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving	IROS 2024
`NAVSIM`	NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking	NeurIPS 2024
`DrivingDojo`	DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model	NeurIPS 2024
`EUVS`	Extrapolated Urban View Synthesis Benchmark	ICCV 2025
`Pi3DET`	Perspective-Invariant 3D Object Detection	ICCV 2025

2. World Modeling from Video Generation

1️⃣ Data Engines

Model	Paper	Venue	Website	GitHub

`BEVControl`	BEVControl: Accurately Controlling Street-View Elements with Multi-Perspective Consistency via BEV Sketch Layout	arXiv 2023	-	-
`BEVGen`	Street-View Image Generation from a Bird's-Eye View Layout	RA-L 2024
`MagicDrive`	MagicDrive: Street View Generation with Diverse 3D Geometry Control	ICLR 2024
`Panacea`	Panacea: Panoramic and Controllable Video Generation for Autonomous Driving	CVPR 2024
`DrivingDiffusion`	DrivingDiffusion: Layout-Guided Multi-View Driving Scene Video Generation with Latent Diffusion Model	ECCV 2024
`WoVoGen`	WoVoGen: World Volume-Aware Diffusion for Controllable Multi-Camera Driving Scene Generation	ECCV 2024	-
`Delphi`	Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation	arXiv 2024
`SimGen`	SimGen: Simulator-Conditioned Driving Scene Generation	NeurIPS 2024
`BEVWorld`	BEVWorld: A Multimodal World Simulator for Autonomous Driving via Scene-Level BEV Latents	arXiv 2024	-	-
`Panacea+`	Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving	arXiv 2024		-
`DiVE`	DiVE: DiT-Based Video Generation with Enhanced Control	arXiv 2024
`SyntheOcc`	SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs	arXiv 2024
`HoloDrive`	HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving	arXiv 2024	-	-
`CogDriving`	Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention	arXiv 2024		-
`UniMLVG`	UniMLVG: Unified Framework for Multi-View Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving	arXiv 2024	-
`DrivePhysica`	Physical Informed Driving World Model	arXiv 2024		-
`DriveDreamer-2`	DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation	AAAI 2025
`SubjectDrive`	SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control	AAAI 2025		-
`Glad`	Glad: A Streaming Scene Generator for Autonomous Driving	ICLR 2025	-
`DualDiff`	DualDiff: Dual-Branch Diffusion Model for Autonomous Driving with Semantic Fusion	ICRA 2025	-
`UniScene`	UniScene: Unified Occupancy-Centric Driving Scene Generation	CVPR 2025
`DriveScape`	DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation	CVPR 2025		-
`PerLDiff`	PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models	ICCV 2025
`MagicDrive-V2`	MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control	ICCV 2025		-
`Cosmos-Transfer1`	Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control	arXiv 2025
`DualDiff+`	DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance	arXiv 2025	-
`CoGen`	CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving	arXiv 2025		-
`NoiseController`	NoiseController: Towards Consistent Multi-View Video Generation via Noise Decomposition and Collaboration	arXiv 2025	-	-
`STAGE`	STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation	arXiv 2025	-	-

2️⃣ Action Interpreters

Model	Paper	Venue	Website	GitHub

`GAIA-1`	GAIA-1: A Generative World Model for Autonomous Driving	arXiv 2023		-
`ADriver-I`	ADriver-I: A General World Model for Autonomous Driving	arXiv 2023	-	-
`Drive-WM`	Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving	CVPR 2024
`DriveDreamer`	DriveDreamer: Towards Real-World-Driven World Models for Autonomous Driving	ECCV 2024
`GenAD`	GenAD: Generalized Predictive Model for Autonomous Driving	CVPR 2024	-
`Vista`	Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability	NeurIPS 2024
`InfinityDrive`	InfinityDrive: Breaking Time Limits in Driving World Models	arXiv 2024		-
`DrivingGPT`	DrivingGPT: Unifying Driving World Modeling and Planning with Multi-Modal Autoregressive Transformers	arXiv 2024		-
`DrivingWorld`	DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT	arXiv 2024
`GEM`	GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control	CVPR 2025
`MaskGWM`	MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction	CVPR 2025	-
`Epona`	Epona: Autoregressive Diffusion World Model for Autonomous Driving	ICCV 2025
`VaViM & VaVAM`	VaViM and VaVAM: Autonomous Driving through Video Generative Modeling	arXiv 2025
`MiLA`	MiLA: Multi-View Intensive-Fidelity Long-Term Video Generation World Model for Autonomous Driving	arXiv 2025	-
`GAIA-2`	GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving	arXiv 2025		-
`DriVerse`	DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment	arXiv 2025	-	-
`PosePilot`	PosePilot: Steering Camera Pose for Generative World Models with Self-Supervised Depth	arXiv 2025	-	-
`ProphetDWM`	ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos	arXiv 2025	-	-
`LongDWM`	LongDWM: Cross-Granularity Distillation for Building A Long-Term Driving World Model	arXiv 2025

3️⃣ Neural Simulators

Model	Paper	Venue

`MagicDrive3D`	MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes	arXiv 2024
`DreamForge`	DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes	arXiv 2024
`Doe-1`	Doe-1: Closed-Loop Autonomous Driving with Large World Model	arXiv 2024
`DrivingSphere`	DrivingSphere: Building A High-Fidelity 4D World for Closed-Loop Simulation	CVPR 2025
`UMGen`	Generating Multimodal Driving Scenes via Next-Scene Prediction	CVPR 2025
`DriveArena`	DriveArena: A Closed-Loop Generative Simulation Platform for Autonomous Driving	ICCV 2025
`InfiniCube`	InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models	ICCV 2025
`DiST-4D`	DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation	ICCV 2025
`UniFuture`	Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception	arXiv 2025
`Nexus`	Decoupled Diffusion Sparks Adaptive Scene Generation	arXiv 2025
`Challenger`	Challenger: Affordable Adversarial Driving Video Generation	arXiv 2025
`Cosmos-Drive`	Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models	arXiv 2025

4️⃣ Scene Reconstructors

Model	Paper	Venue	Website	GitHub

`3DGS`	3D Gaussian Splatting for Real-Time Radiance Field Rendering	TOG 2023
`StreetGaussian`	Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting	ECCV 2024
`4DGF`	Dynamic 3D Gaussian Fields for Urban Areas	NeurIPS 2024
`SCube`	SCube: Instant Large-Scale Scene Reconstruction using VoxSplats	NeurIPS 2024
`HUGS`	HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting	CVPR 2024
`MagicDrive3D`	MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes	arXiv 2024
`S3Gaussian`	S3Gaussian: Self-Supervised Street Gaussians for Autonomous Driving	arXiv 2024
`VDG`	VDG: Vision-Only Dynamic Gaussian for Driving Simulation	arXiv 2024
`UniGaussian`	UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations	arXiv 2024	-	-
`Stag-1`	Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model	arXiv 2024
`DrivingRecon`	DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving	arXiv 2024	-
`OccScene`	OccScene: Semantic Occupancy-Based Cross-Task Mutual Learning for 3D Scene Generation	arXiv 2024	-	-
`SGD`	SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior	WACV 2025	-	-
`OmniRe`	OmniRe: Omni Urban Scene Reconstruction	ICLR 2025
`DriveDreamer4D`	DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation	CVPR 2025
`DeSiRe-GS`	DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes	CVPR 2025	-
`SplatAD`	SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving	CVPR 2025
`ReconDreamer`	ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration	CVPR 2025
`FreeSim`	FreeSim: Toward Free-Viewpoint Camera Simulation in Driving Scenes	CVPR 2025		-
`StreetCrafter`	StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models	CVPR 2025
`FlexDrive`	FlexDrive: Toward Trajectory Flexibility in Driving Scene Reconstruction and Rendering	CVPR 2025	-	-
`S-NeRF++`	S-NeRF++: Autonomous Driving Simulation via Neural Reconstruction and Generation	TPAMI 2025	-	-
`InfiniCube`	InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models	ICCV 2025
`DiST-4D`	Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation	ICCV 2025
`DreamDrive`	DreamDrive: Generative 4D Scene Modeling from Street View Images	arXiv 2025		-
`Uni-Gaussians`	Uni-Gaussians: Unifying Camera and Lidar Simulation with Gaussians for Dynamic Driving Scenarios	arXiv 2025		-
`MuDG`	MuDG: Taming Multi-Modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction	arXiv 2025
`UniFuture`	Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception	arXiv 2025
`SceneCrafter`	Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving Humanoid Robots	arXiv 2025	-
`ReconDreamer++`	ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation	arXiv 2025
`RealEngine`	RealEngine: Simulating Autonomous Driving in Realistic Context	arXiv 2025	-
`GeoDrive`	GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control	arXiv 2025	-
`PseudoSimulation`	Pseudo-Simulation for Autonomous Driving	arXiv 2025	-
`Dreamland`	Dreamland: Controllable World Creation with Simulator and Generative Models	arXiv 2025		-

3. World Modeling from Occupancy Generation

1️⃣ Scene Representors

Model	Paper	Venue	Website	GitHub

`SSD`	Diffusion Probabilistic Models for Scene-Scale 3D Categorical Data	arXiv 2023	-
`SemCity`	SemCity: Semantic Scene Generation with Triplane Diffusion	CVPR 2024
`WoVoGen`	WoVoGen: World Volume-Aware Diffusion for Controllable Multi-Camera Driving Scene Generation	ECCV 2024	-
`UrbanDiff`	Urban Scene Diffusion through Semantic Occupancy Map	arXiv 2024		-
`DrivingSphere`	DrivingSphere: Building A High-Fidelity 4D World for Closed-Loop Simulation	CVPR 2025
`UniScene`	UniScene: Unified Occupancy-Centric Driving Scene Generation	CVPR 2025
`OccScene`	OccScene: Semantic Occupancy-Based Cross-Task Mutual Learning for 3D Scene Generation	arXiv 2024	-	-
`InfiniCube`	InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models	ICCV 2025
`Control-3D-Scene`	Controllable 3D Outdoor Scene Generation via Scene Graphs	ICCV 2025
`X-Scene`	X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability	arXiv 2025

2️⃣ Occupancy Forecasters

Model	Paper	Venue	Website	GitHub

`Emergent-Occ`	Differentiable Raycasting for Self-supervised Occupancy Forecasting	ECCV 2022	-
`FF4D`	Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting	CVPR 2023
`UniWorld`	UniWorld: Autonomous Driving Pre-Training via World Models	arXiv 2023	-	-
`UniScene`	UniScene: Multi-Camera Unified Pre-Training via 3D Scene Reconstruction for Autonomous Driving	arXiv 2023	-
`OccWorld`	OccWorld: Learning A 3D Occupancy World Model for Autonomous Driving	ECCV 2024
`Cam4DOcc`	Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications	CVPR 2024	-
`DriveWorld`	DriveWorld: 4D Pre-Trained Scene Understanding via World Models for Autonomous Driving	CVPR 2024	-	-
`OccSora`	OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving	arXiv 2024
`UnO`	UnO: Unsupervised Occupancy Fields for Perception and Forecasting	CVPR 2024		-
`LOPR`	Self-Supervised Multi-Future Occupancy Forecasting for Autonomous Driving	arXiv 2024	-	-
`FSF-Net`	FSF-Net: Enhance 4D Occupancy Forecasting with Coarse BEV Scene Flow for Autonomous Driving	arXiv 2024	-	-
`OccLLaMA`	OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving	arXiv 2024	-	-
`DOME`	DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model	arXiv 2024
`GaussianAD`	GaussianAD: Gaussian-Centric End-to-End Autonomous Driving	arXiv 2024
`DFIT-OccWorld`	An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-Assisted Training	arXiv 2024	-	-
`Drive-OccWorld`	Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving	AAAI 2025
`PreWorld`	Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving	ICLR 2025	-
`OccProphet`	OccProphet: Pushing Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with Observer-Forecaster-Refiner Framework	ICLR 2025	-
`RenderWorld`	RenderWorld: World Model with Self-Supervised 3D Label	ICRA 2025	-	-
`Occ-LLM`	Occ-LLM: Enhancing Autonomous Driving with Occupancy-Based Large Language Models	ICRA 2025	-	-
`EfficientOCF`	Spatiotemporal Decoupling for Efficient Vision-Based Occupancy Forecasting	CVPR 2025	-	-
`DIO`	DIO: Decomposable Implicit 4D Occupancy-Flow World Model	CVPR 2025	-	-
`T³Former`	Temporal Triplane Transformers as Occupancy World Models	arXiv 2025	-	-
`UniOcc`	UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving	ICCV 2025
`I²World`	I²-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting	ICCV 2025	-
`COME`	COME: Adding Scene-Centric Forecasting Control to Occupancy World Model	arXiv 2025	-

3️⃣ Autoregressive Simulators

Model	Paper	Venue

`SemCity`	SemCity: Semantic Scene Generation with Triplane Diffusion	CVPR 2024
`XCube`	XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies	CVPR 2024
`PDD`	Pyramid Diffusion for Fine 3D Large Scene Generation	ECCV 2024
`OccSora`	OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving	arXiv 2024
`DynamicCity`	DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes	ICLR 2025
`DrivingSphere`	DrivingSphere: Building A High-Fidelity 4D World for Closed-Loop Simulation	CVPR 2025
`InfiniCube`	InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models	ICCV 2025
`X-Scene`	X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability	arXiv 2025
`PrITTI`	PrITTI: Primitive-Based Generation of Controllable and Editable 3D Semantic Scenes	arXiv 2025

4. World Modeling from LiDAR Generation

1️⃣ Data Engines

Model	Paper	Venue	Website	GitHub

`DUSty`	Learning to Drop Points for LiDAR Scan Synthesis	IROS 2021
`LiDARGen`	Learning to Generate Realistic LiDAR Point Clouds	ECCV 2022	-
`DUSty v2`	Generative Range Imaging for Learning Scene Priors of 3D LiDAR Data	WACV 2023
`UltraLiDAR`	UltraLiDAR: Learning Compact Representations for LiDAR Completion and Generation	CVPR 2023		-
`Copilot4D`	Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion	ICLR 2024		-
`R2DM`	LiDAR Data Synthesis with Denoising Diffusion Probabilistic Models	ICRA 2024
`ViDAR`	Visual Point Cloud Forecasting enables Scalable Autonomous Driving	CVPR 2024	-
`LiDiff`	Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion	CVPR 2024	-
`LiDM`	Towards Realistic Scene Generation with LiDAR Diffusion Models	CVPR 2024	-
`RangeLDM`	RangeLDM: Fast Realistic LiDAR Point Cloud Generation	ECCV 2024	-
`Text2LiDAR`	Text2LiDAR: Text-Guided LiDAR Point Cloud Generation via Equirectangular Transformer	ECCV 2024	-
`LiDARGRIT`	Taming Transformers for Realistic LiDAR Point Cloud Generation	arXiv 2024	-
`BEVWorld`	BEVWorld: A Multimodal World Simulator for Autonomous Driving via Scene-Level BEV Latents	arXiv 2024	-
`SDS`	Simultaneous Diffusion Sampling for Conditional LiDAR Generation	arXiv 2024	-	-
`DiffSSC`	DiffSSC: Semantic LiDAR Scan Completion using Denoising Diffusion Probabilistic Models	IROS 2025	-	-
`HoloDrive`	HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving	arXiv 2024	-	-
`LOGen`	LOGen: Toward LiDAR Object Generation by Point Diffusion	arXiv 2024
`OLiDM`	OLiDM: Object-Aware LiDAR Diffusion Models for Autonomous Driving	AAAI 2025
`X-Drive`	X-Drive: Cross-Modality Consistent Multi-Sensor Data Synthesis for Driving Scenarios	ICLR 2025	-
`LidarDM`	LidarDM: Generative LiDAR Simulation in a Generated World	ICRA 2025
`LiDAR-EDIT`	LiDAR-EDIT: LiDAR Data Generation by Editing the Object Layouts in Real-World Scenes	ICRA 2025
`R2Flow`	Fast LiDAR Data Generation with Rectified Flows	ICRA 2025
`WeatherGen`	WeatherGen: A Unified Diverse Weather Generator for LiDAR Point Clouds via Spider Mamba Diffusion	CVPR 2025	-
`LiDPM`	LiDPM: Rethinking Point Diffusion for Lidar Scene Completion	IV 2025
`HERMES`	HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation	ICCV 2025
`SuperPC`	SuperPC: A Single Diffusion Model for Point Cloud Completion, Upsampling, Denoising, and Colorization	CVPR 2025		-
`3DiSS`	Towards Generating Realistic 3D Semantic Training Data for Autonomous Driving	arXiv 2025	-
`Distill-DPO`	Diffusion Distillation With Direct Preference Optimization For Efficient 3D LiDAR Scene Completion	arXiv 2025	-
`DriveX`	DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving	arXiv 2025	-	-
`OpenDWM`	OpenDWM: Open Driving World Models	arXiv 2025	-
`SPIRAL`	SPIRAL: Semantic-Aware Progressive LiDAR Scene Generation	arXiv 2025	-	-
`La La LiDAR`	La La LiDAR: Large-Scale Layout Generation from LiDAR Data	arXiv 2025	-	-
`Veila`	Veila: Panoramic LiDAR Generation from a Monocular RGB Image	arXiv 2025	-	-
`LiDARCrafter`	LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences	arXiv 2025

2️⃣ Action Forecasters

Model	Paper	Venue	Website	GitHub

`Copilot4D`	Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion	ICLR 2024		-
`ViDAR`	Visual Point Cloud Forecasting enables Scalable Autonomous Driving	CVPR 2024	-
`BEVWorld`	BEVWorld: A Multimodal World Simulator for Autonomous Driving via Scene-Level BEV Latents	arXiv 2024	-
`HERMES`	HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation	ICCV 2025
`DriveX`	DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving	arXiv 2025	-	-

3️⃣ Autoregressive Simulators

Model	Paper	Venue	Website	GitHub

`HoloDrive`	HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving	arXiv 2024	-	-
`LidarDM`	LidarDM: Generative LiDAR Simulation in a Generated World	ICRA 2025
`OpenDWM`	OpenDWM: Open Driving World Models	arXiv 2025	-
`LiDARCrafter`	LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences	arXiv 2025

5. Applications

1️⃣ Autonomous Driving

Model	Paper	Venue	Website	GitHub

`OccSora`	OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving	arXiv 2024	-
`DFIT-OccWorld`	An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-Assisted Training	arXiv 2024	-	-
`LiDARCrafter`	LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences	arXiv 2025
`UniSim`	UniSim: A Neural Closed-Loop Sensor Simulator	CVPR 2023		-
`Panacea`	Panacea: Panoramic and Controllable Video Generation for Autonomous Driving	CVPR 2024
`Delphi`	Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation	arXiv 2024
`DriveDreamer-2`	DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation	AAAI 2025
`Panacea+`	Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving	arXiv 2024		-
`MiLA`	MiLA: Multi-View Intensive-Fidelity Long-Term Video Generation World Model for Autonomous Driving	arXiv 2025	-
`GAIA-2`	GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving	arXiv 2025		-

2️⃣ Robotics

Model	Paper	Venue	Website	GitHub

`RoboDreamer`	RoboDreamer: Learning Compositional World Models for Robot Imagination	Arxiv 2024
`BEHAVIOR`	BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities	CoRL 2025
`Habitat 2.0`	Habitat 2.0: Training Home Assistants to Rearrange their Habitat	arXiv 2021	-	-
`FMR`	Foundation Models in Robotics: Applications, Challenges, and the Future	IJRR 2024	-
`VLMPS`	Visual Language Maps for Robot Navigation	ICRA 2023

3️⃣ Video Games & XR

Model	Paper	Venue	Website	GitHub

`ILVE`	Interactive Latent Variable Evolution for the Generation of Minecraft Structures	ICFDG 2021	-	-
`ProcTHOR`	ProcTHOR: Large-Scale Embodied AI Using Procedural Generation	NeurIPS 2022
`WorldGPT`	WorldGPT: Empowering LLM as Multimodal World Model	ACM MM 2024	-
`Text2World`	Text2World: Benchmarking Large Language Models for Symbolic World Model Generation	arXiv 2025
`Hunyuan3D 1.0`	Hunyuan3D 1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation	arXiv 2025
`Hunyuan3D 2.0`	Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation	arXiv 2025
`Hunyuan3D 2.1`	Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material	arXiv 2025
`Hunyuan3D 2.5`	Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details	arXiv 2025
`Hunyuan-GameCraft`	Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition	arXiv 2025
`HunyuanWorld 1.0`	HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels	arXiv 2025
`MGVQ`	MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-Group Quantization	arXiv 2025

4️⃣ Digital Twins

Model	Paper	Venue	Website	GitHub

`DynamicCity`	DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes	ICLR 2025
`UrbanScene3D`	Capturing, Reconstructing, and Simulating: the UrbanScene3D Datase	ECCV 2022
`GaussianCity`	GaussianCity: Generative Gaussian Splatting for Unbounded 3D City Generation	CVPR 2025
`UrbanWorld`	UrbanWorld: An Urban World Model for 3D City Generation	Arxiv 2024
`SceneDiffuser++`	SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model	CVPR 2025	-	-

6. Other Resources

Workshops

CVPR 2025 Workshop & Challenge | OpenDriveLab Track: World Model.
World Model Bench @ CVPR'25 WorldModelBench: The 1st Workshop on Benchmarking World Models
CVPR 2024 Workshop & Challenge | OpenDriveLab Track #4: Predictive World Model.

Minisoco/survey

😎 Awesome 3D and 4D World Models

📚 Citation

Table of Contents

Background

What Are Native 3D Representations?

What Are World Models in 3D and 4D?

1. Benchmarks & Datasets

Benchmarks

Datasets

2. World Modeling from Video Generation

1️⃣ Data Engines

2️⃣ Action Interpreters

3️⃣ Neural Simulators

4️⃣ Scene Reconstructors

3. World Modeling from Occupancy Generation

1️⃣ Scene Representors

2️⃣ Occupancy Forecasters

3️⃣ Autoregressive Simulators

4. World Modeling from LiDAR Generation

1️⃣ Data Engines

2️⃣ Action Forecasters

3️⃣ Autoregressive Simulators

5. Applications

1️⃣ Autonomous Driving

2️⃣ Robotics

3️⃣ Video Games & XR

4️⃣ Digital Twins

6. Other Resources

Workshops

Tutorials

Talks & Seminars

7. Acknowledgements