😎 Awesome 3D and 4D World Models
This survey reviews state-of-the-art 3D and 4D world models - systems that learn, predict, and simulate the geometry and dynamics of real environments from multi-modal signals.
We unify terminology, scope, and evaluations, and organize the space into three complementary paradigms by representation:
Learn generative or predictive models from sequential video streams with geometric and temporal constraints. VideoGen focuses on long-horizon consistency, controllability, and scene-level generation, enabling agents to imagine or forecast plausible video rollouts.
Model 3D/4D occupancy grids that encode geometry and semantics in voxel space. OccGen provides a physics-consistent scaffold for robust perception, forecasting, and simulation, bridging low-level sensor data and high-level reasoning.
Leverage point cloud sequences from LiDAR sensors to generate or predict geometry-grounded scenes. LiDARGen emphasizes high-fidelity 3D structure, robustness to environment changes, and applications in safety-critical domains such as autonomous driving.
For more details, kindly refer to our paper and project page . 🚀
If you find this work helpful for your research, please kindly consider citing our paper:
@article {survey_3d_4d_world_models ,
title = { 3D and 4D World Modeling: A Survey} ,
author = { Lingdong Kong and Wesley Yang and Jianbiao Mei and Youquan Liu and Ao Liang and Dekai Zhu and Dongyue Lu and Wei Yin and Xiaotao Hu and Mingkai Jia and Junyuan Deng and Kaiwen Zhang and Yang Wu and Tianyi Yan and Shenyuan Gao and Song Wang and Linfeng Li and Liang Pan and Yong Liu and Jianke Zhu and Wei Tsang Ooi and Steven C. H. Hoi and Ziwei Liu} ,
journal = { arXiv preprint arXiv:2509.07996} ,
year = { 2025} ,
}
World modeling has become a cornerstone of modern AI, enabling agents to understand, represent, and predict dynamic environments. While prior research has focused primarily on 2D images and videos, the rapid emergence of native 3D and 4D representations (e.g., RGB-D, occupancy grids, LiDAR point clouds) calls for a dedicated study.
What Are Native 3D Representations?
Unlike 2D projections, native 3D/4D signals directly encode metric geometry, visibility, and motion in the physical coordinates where agents act. Examples include:
RGB-D imagery (2D images with depth channels)
Occupancy grids (voxelized maps of free vs. occupied space)
LiDAR point clouds (3D coordinates from active sensing)
Neural fields (e.g., NeRF, Gaussian Splatting)
What Are World Models in 3D and 4D?
A 3D/4D world model is an internal representation that allows an agent to imagine , forecast , and interact with its environment in the 3D space.
Generative World Models: synthesize plausible 3D/4D worlds under conditions (e.g., text prompts, trajectories).
Predictive World Models: anticipate the future evolution of 3D/4D scenes given past observations and actions.
Together, these models provide the foundation for simulation, planning, and embodied intelligence in complex environments.
⏲️ In chronological order, from the earliest to the latest.
Model
Paper
Venue
Website
KITTI
Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite
CVPR 2012
NYUv2
Indoor Segmentation and Support Inference from RGBD Images
ECCV 2012
CARLA
CARLA: An Open Urban Driving Simulator
CoRL 2017
SemanticKITTI
SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences
ICCV 2019
nuScenes
nuScenes: A Multimodal Dataset for Autonomous Driving
CVPR 2020
Waymo Open
Scalability in Perception for Autonomous Driving: Waymo Open Dataset
CVPR 2020
STF
Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather
CVPR 2020
Virtual KITTI 2
Virtual KITTI 2
arXiv 2020
Argoverse 2
Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting
NeurIPS 2021
Lyft-Level5
One Thousand and One Hours: Self-Driving Motion Prediction Dataset
CoRL 2021
nuPlan
nuPlan: A Closed-Loop ML-Based Planning Benchmark for Autonomous Vehicles
CVPRW 2021
PandaSet
PandaSet: Advanced Sensor Suite Dataset for Autonomous Driving
ITSC 2022
OpenCOOD
OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle Communication
ICRA 2022
KITTI-360
KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D
TPAMI 2022
CarlaSC
MotionSC: Data Set and Network for Real-Time Semantic Mapping in Dynamic Environments
RA-L 2022
Robo3D
Robo3D: Towards Robust and Reliable 3D Perception against Corruptions
ICCV 2023
OpenOccupancy
OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception
ICCV 2023
Occ3D-nuScenes
Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving
NeurIPS 2023
OpenDV-YouTube
GenAD: Generalized Predictive Model for Autonomous Driving
CVPR 2024
SSCBench
SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving
IROS 2024
NAVSIM
NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking
NeurIPS 2024
DrivingDojo
DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model
NeurIPS 2024
EUVS
Extrapolated Urban View Synthesis Benchmark
ICCV 2025
Pi3DET
Perspective-Invariant 3D Object Detection
ICCV 2025
2. World Modeling from Video Generation
⏲️ In chronological order, from the earliest to the latest.
Model
Paper
Venue
Website
GitHub
BEVControl
BEVControl: Accurately Controlling Street-View Elements with Multi-Perspective Consistency via BEV Sketch Layout
arXiv 2023
-
-
BEVGen
Street-View Image Generation from a Bird's-Eye View Layout
RA-L 2024
MagicDrive
MagicDrive: Street View Generation with Diverse 3D Geometry Control
ICLR 2024
Panacea
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving
CVPR 2024
DrivingDiffusion
DrivingDiffusion: Layout-Guided Multi-View Driving Scene Video Generation with Latent Diffusion Model
ECCV 2024
WoVoGen
WoVoGen: World Volume-Aware Diffusion for Controllable Multi-Camera Driving Scene Generation
ECCV 2024
-
Delphi
Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation
arXiv 2024
SimGen
SimGen: Simulator-Conditioned Driving Scene Generation
NeurIPS 2024
BEVWorld
BEVWorld: A Multimodal World Simulator for Autonomous Driving via Scene-Level BEV Latents
arXiv 2024
-
-
Panacea+
Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving
arXiv 2024
-
DiVE
DiVE: DiT-Based Video Generation with Enhanced Control
arXiv 2024
SyntheOcc
SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs
arXiv 2024
HoloDrive
HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving
arXiv 2024
-
-
CogDriving
Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention
arXiv 2024
-
UniMLVG
UniMLVG: Unified Framework for Multi-View Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving
arXiv 2024
-
DrivePhysica
Physical Informed Driving World Model
arXiv 2024
-
DriveDreamer-2
DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation
AAAI 2025
SubjectDrive
SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control
AAAI 2025
-
Glad
Glad: A Streaming Scene Generator for Autonomous Driving
ICLR 2025
-
DualDiff
DualDiff: Dual-Branch Diffusion Model for Autonomous Driving with Semantic Fusion
ICRA 2025
-
UniScene
UniScene: Unified Occupancy-Centric Driving Scene Generation
CVPR 2025
DriveScape
DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation
CVPR 2025
-
PerLDiff
PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models
ICCV 2025
MagicDrive-V2
MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control
ICCV 2025
-
Cosmos-Transfer1
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control
arXiv 2025
DualDiff+
DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance
arXiv 2025
-
CoGen
CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving
arXiv 2025
-
NoiseController
NoiseController: Towards Consistent Multi-View Video Generation via Noise Decomposition and Collaboration
arXiv 2025
-
-
STAGE
STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation
arXiv 2025
-
-
⏲️ In chronological order, from the earliest to the latest.
Model
Paper
Venue
Website
GitHub
GAIA-1
GAIA-1: A Generative World Model for Autonomous Driving
arXiv 2023
-
ADriver-I
ADriver-I: A General World Model for Autonomous Driving
arXiv 2023
-
-
Drive-WM
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving
CVPR 2024
DriveDreamer
DriveDreamer: Towards Real-World-Driven World Models for Autonomous Driving
ECCV 2024
GenAD
GenAD: Generalized Predictive Model for Autonomous Driving
CVPR 2024
-
Vista
Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability
NeurIPS 2024
InfinityDrive
InfinityDrive: Breaking Time Limits in Driving World Models
arXiv 2024
-
DrivingGPT
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-Modal Autoregressive Transformers
arXiv 2024
-
DrivingWorld
DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT
arXiv 2024
GEM
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control
CVPR 2025
MaskGWM
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction
CVPR 2025
-
Epona
Epona: Autoregressive Diffusion World Model for Autonomous Driving
ICCV 2025
VaViM & VaVAM
VaViM and VaVAM: Autonomous Driving through Video Generative Modeling
arXiv 2025
MiLA
MiLA: Multi-View Intensive-Fidelity Long-Term Video Generation World Model for Autonomous Driving
arXiv 2025
-
GAIA-2
GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving
arXiv 2025
-
DriVerse
DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment
arXiv 2025
-
-
PosePilot
PosePilot: Steering Camera Pose for Generative World Models with Self-Supervised Depth
arXiv 2025
-
-
ProphetDWM
ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos
arXiv 2025
-
-
LongDWM
LongDWM: Cross-Granularity Distillation for Building A Long-Term Driving World Model
arXiv 2025
⏲️ In chronological order, from the earliest to the latest.
Model
Paper
Venue
Website
GitHub
MagicDrive3D
MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes
arXiv 2024
DreamForge
DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes
arXiv 2024
Doe-1
Doe-1: Closed-Loop Autonomous Driving with Large World Model
arXiv 2024
DrivingSphere
DrivingSphere: Building A High-Fidelity 4D World for Closed-Loop Simulation
CVPR 2025
UMGen
Generating Multimodal Driving Scenes via Next-Scene Prediction
CVPR 2025
DriveArena
DriveArena: A Closed-Loop Generative Simulation Platform for Autonomous Driving
ICCV 2025
InfiniCube
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
ICCV 2025
DiST-4D
DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation
ICCV 2025
UniFuture
Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception
arXiv 2025
Nexus
Decoupled Diffusion Sparks Adaptive Scene Generation
arXiv 2025
Challenger
Challenger: Affordable Adversarial Driving Video Generation
arXiv 2025
Cosmos-Drive
Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models
arXiv 2025
⏲️ In chronological order, from the earliest to the latest.
Model
Paper
Venue
Website
GitHub
3DGS
3D Gaussian Splatting for Real-Time Radiance Field Rendering
TOG 2023
StreetGaussian
Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting
ECCV 2024
4DGF
Dynamic 3D Gaussian Fields for Urban Areas
NeurIPS 2024
SCube
SCube: Instant Large-Scale Scene Reconstruction using VoxSplats
NeurIPS 2024
HUGS
HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting
CVPR 2024
MagicDrive3D
MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes
arXiv 2024
S3Gaussian
S3Gaussian: Self-Supervised Street Gaussians for Autonomous Driving
arXiv 2024
VDG
VDG: Vision-Only Dynamic Gaussian for Driving Simulation
arXiv 2024
UniGaussian
UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations
arXiv 2024
-
-
Stag-1
Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model
arXiv 2024
DrivingRecon
DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving
arXiv 2024
-
OccScene
OccScene: Semantic Occupancy-Based Cross-Task Mutual Learning for 3D Scene Generation
arXiv 2024
-
-
SGD
SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior
WACV 2025
-
-
OmniRe
OmniRe: Omni Urban Scene Reconstruction
ICLR 2025
DriveDreamer4D
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation
CVPR 2025
DeSiRe-GS
DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes
CVPR 2025
-
SplatAD
SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving
CVPR 2025
ReconDreamer
ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration
CVPR 2025
FreeSim
FreeSim: Toward Free-Viewpoint Camera Simulation in Driving Scenes
CVPR 2025
-
StreetCrafter
StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models
CVPR 2025
FlexDrive
FlexDrive: Toward Trajectory Flexibility in Driving Scene Reconstruction and Rendering
CVPR 2025
-
-
S-NeRF++
S-NeRF++: Autonomous Driving Simulation via Neural Reconstruction and Generation
TPAMI 2025
-
-
InfiniCube
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
ICCV 2025
DiST-4D
Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation
ICCV 2025
DreamDrive
DreamDrive: Generative 4D Scene Modeling from Street View Images
arXiv 2025
-
Uni-Gaussians
Uni-Gaussians: Unifying Camera and Lidar Simulation with Gaussians for Dynamic Driving Scenarios
arXiv 2025
-
MuDG
MuDG: Taming Multi-Modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction
arXiv 2025
UniFuture
Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception
arXiv 2025
SceneCrafter
Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving Humanoid Robots
arXiv 2025
-
ReconDreamer++
ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation
arXiv 2025
RealEngine
RealEngine: Simulating Autonomous Driving in Realistic Context
arXiv 2025
-
GeoDrive
GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control
arXiv 2025
-
PseudoSimulation
Pseudo-Simulation for Autonomous Driving
arXiv 2025
-
Dreamland
Dreamland: Controllable World Creation with Simulator and Generative Models
arXiv 2025
-
3. World Modeling from Occupancy Generation
⏲️ In chronological order, from the earliest to the latest.
Model
Paper
Venue
Website
GitHub
SSD
Diffusion Probabilistic Models for Scene-Scale 3D Categorical Data
arXiv 2023
-
SemCity
SemCity: Semantic Scene Generation with Triplane Diffusion
CVPR 2024
WoVoGen
WoVoGen: World Volume-Aware Diffusion for Controllable Multi-Camera Driving Scene Generation
ECCV 2024
-
UrbanDiff
Urban Scene Diffusion through Semantic Occupancy Map
arXiv 2024
-
DrivingSphere
DrivingSphere: Building A High-Fidelity 4D World for Closed-Loop Simulation
CVPR 2025
UniScene
UniScene: Unified Occupancy-Centric Driving Scene Generation
CVPR 2025
OccScene
OccScene: Semantic Occupancy-Based Cross-Task Mutual Learning for 3D Scene Generation
arXiv 2024
-
-
InfiniCube
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
ICCV 2025
Control-3D-Scene
Controllable 3D Outdoor Scene Generation via Scene Graphs
ICCV 2025
X-Scene
X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability
arXiv 2025
2️⃣ Occupancy Forecasters
⏲️ In chronological order, from the earliest to the latest.
Model
Paper
Venue
Website
GitHub
Emergent-Occ
Differentiable Raycasting for Self-supervised Occupancy Forecasting
ECCV 2022
-
FF4D
Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting
CVPR 2023
UniWorld
UniWorld: Autonomous Driving Pre-Training via World Models
arXiv 2023
-
-
UniScene
UniScene: Multi-Camera Unified Pre-Training via 3D Scene Reconstruction for Autonomous Driving
arXiv 2023
-
OccWorld
OccWorld: Learning A 3D Occupancy World Model for Autonomous Driving
ECCV 2024
Cam4DOcc
Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
CVPR 2024
-
DriveWorld
DriveWorld: 4D Pre-Trained Scene Understanding via World Models for Autonomous Driving
CVPR 2024
-
-
OccSora
OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving
arXiv 2024
UnO
UnO: Unsupervised Occupancy Fields for Perception and Forecasting
CVPR 2024
-
LOPR
Self-Supervised Multi-Future Occupancy Forecasting for Autonomous Driving
arXiv 2024
-
-
FSF-Net
FSF-Net: Enhance 4D Occupancy Forecasting with Coarse BEV Scene Flow for Autonomous Driving
arXiv 2024
-
-
OccLLaMA
OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving
arXiv 2024
-
-
DOME
DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model
arXiv 2024
GaussianAD
GaussianAD: Gaussian-Centric End-to-End Autonomous Driving
arXiv 2024
DFIT-OccWorld
An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-Assisted Training
arXiv 2024
-
-
Drive-OccWorld
Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving
AAAI 2025
PreWorld
Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving
ICLR 2025
-
OccProphet
OccProphet: Pushing Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with Observer-Forecaster-Refiner Framework
ICLR 2025
-
RenderWorld
RenderWorld: World Model with Self-Supervised 3D Label
ICRA 2025
-
-
Occ-LLM
Occ-LLM: Enhancing Autonomous Driving with Occupancy-Based Large Language Models
ICRA 2025
-
-
EfficientOCF
Spatiotemporal Decoupling for Efficient Vision-Based Occupancy Forecasting
CVPR 2025
-
-
DIO
DIO: Decomposable Implicit 4D Occupancy-Flow World Model
CVPR 2025
-
-
T³Former
Temporal Triplane Transformers as Occupancy World Models
arXiv 2025
-
-
UniOcc
UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving
ICCV 2025
I²World
I²-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting
ICCV 2025
-
COME
COME: Adding Scene-Centric Forecasting Control to Occupancy World Model
arXiv 2025
-
3️⃣ Autoregressive Simulators
⏲️ In chronological order, from the earliest to the latest.
Model
Paper
Venue
Website
GitHub
SemCity
SemCity: Semantic Scene Generation with Triplane Diffusion
CVPR 2024
XCube
XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies
CVPR 2024
PDD
Pyramid Diffusion for Fine 3D Large Scene Generation
ECCV 2024
OccSora
OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving
arXiv 2024
DynamicCity
DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes
ICLR 2025
DrivingSphere
DrivingSphere: Building A High-Fidelity 4D World for Closed-Loop Simulation
CVPR 2025
InfiniCube
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
ICCV 2025
X-Scene
X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability
arXiv 2025
PrITTI
PrITTI: Primitive-Based Generation of Controllable and Editable 3D Semantic Scenes
arXiv 2025
4. World Modeling from LiDAR Generation
⏲️ In chronological order, from the earliest to the latest.
Model
Paper
Venue
Website
GitHub
DUSty
Learning to Drop Points for LiDAR Scan Synthesis
IROS 2021
LiDARGen
Learning to Generate Realistic LiDAR Point Clouds
ECCV 2022
-
DUSty v2
Generative Range Imaging for Learning Scene Priors of 3D LiDAR Data
WACV 2023
UltraLiDAR
UltraLiDAR: Learning Compact Representations for LiDAR Completion and Generation
CVPR 2023
-
Copilot4D
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion
ICLR 2024
-
R2DM
LiDAR Data Synthesis with Denoising Diffusion Probabilistic Models
ICRA 2024
ViDAR
Visual Point Cloud Forecasting enables Scalable Autonomous Driving
CVPR 2024
-
LiDiff
Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion
CVPR 2024
-
LiDM
Towards Realistic Scene Generation with LiDAR Diffusion Models
CVPR 2024
-
RangeLDM
RangeLDM: Fast Realistic LiDAR Point Cloud Generation
ECCV 2024
-
Text2LiDAR
Text2LiDAR: Text-Guided LiDAR Point Cloud Generation via Equirectangular Transformer
ECCV 2024
-
LiDARGRIT
Taming Transformers for Realistic LiDAR Point Cloud Generation
arXiv 2024
-
BEVWorld
BEVWorld: A Multimodal World Simulator for Autonomous Driving via Scene-Level BEV Latents
arXiv 2024
-
SDS
Simultaneous Diffusion Sampling for Conditional LiDAR Generation
arXiv 2024
-
-
DiffSSC
DiffSSC: Semantic LiDAR Scan Completion using Denoising Diffusion Probabilistic Models
IROS 2025
-
-
HoloDrive
HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving
arXiv 2024
-
-
LOGen
LOGen: Toward LiDAR Object Generation by Point Diffusion
arXiv 2024
OLiDM
OLiDM: Object-Aware LiDAR Diffusion Models for Autonomous Driving
AAAI 2025
X-Drive
X-Drive: Cross-Modality Consistent Multi-Sensor Data Synthesis for Driving Scenarios
ICLR 2025
-
LidarDM
LidarDM: Generative LiDAR Simulation in a Generated World
ICRA 2025
LiDAR-EDIT
LiDAR-EDIT: LiDAR Data Generation by Editing the Object Layouts in Real-World Scenes
ICRA 2025
R2Flow
Fast LiDAR Data Generation with Rectified Flows
ICRA 2025
WeatherGen
WeatherGen: A Unified Diverse Weather Generator for LiDAR Point Clouds via Spider Mamba Diffusion
CVPR 2025
-
LiDPM
LiDPM: Rethinking Point Diffusion for Lidar Scene Completion
IV 2025
HERMES
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
ICCV 2025
SuperPC
SuperPC: A Single Diffusion Model for Point Cloud Completion, Upsampling, Denoising, and Colorization
CVPR 2025
-
3DiSS
Towards Generating Realistic 3D Semantic Training Data for Autonomous Driving
arXiv 2025
-
Distill-DPO
Diffusion Distillation With Direct Preference Optimization For Efficient 3D LiDAR Scene Completion
arXiv 2025
-
DriveX
DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving
arXiv 2025
-
-
OpenDWM
OpenDWM: Open Driving World Models
arXiv 2025
-
SPIRAL
SPIRAL: Semantic-Aware Progressive LiDAR Scene Generation
arXiv 2025
-
-
La La LiDAR
La La LiDAR: Large-Scale Layout Generation from LiDAR Data
arXiv 2025
-
-
Veila
Veila: Panoramic LiDAR Generation from a Monocular RGB Image
arXiv 2025
-
-
LiDARCrafter
LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences
arXiv 2025
⏲️ In chronological order, from the earliest to the latest.
Model
Paper
Venue
Website
GitHub
Copilot4D
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion
ICLR 2024
-
ViDAR
Visual Point Cloud Forecasting enables Scalable Autonomous Driving
CVPR 2024
-
BEVWorld
BEVWorld: A Multimodal World Simulator for Autonomous Driving via Scene-Level BEV Latents
arXiv 2024
-
HERMES
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
ICCV 2025
DriveX
DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving
arXiv 2025
-
-
3️⃣ Autoregressive Simulators
⏲️ In chronological order, from the earliest to the latest.
Model
Paper
Venue
Website
GitHub
HoloDrive
HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving
arXiv 2024
-
-
LidarDM
LidarDM: Generative LiDAR Simulation in a Generated World
ICRA 2025
OpenDWM
OpenDWM: Open Driving World Models
arXiv 2025
-
LiDARCrafter
LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences
arXiv 2025
Model
Paper
Venue
Website
GitHub
OccSora
OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving
arXiv 2024
-
DFIT-OccWorld
An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-Assisted Training
arXiv 2024
-
-
LiDARCrafter
LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences
arXiv 2025
UniSim
UniSim: A Neural Closed-Loop Sensor Simulator
CVPR 2023
-
Panacea
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving
CVPR 2024
Delphi
Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation
arXiv 2024
DriveDreamer-2
DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation
AAAI 2025
Panacea+
Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving
arXiv 2024
-
MiLA
MiLA: Multi-View Intensive-Fidelity Long-Term Video Generation World Model for Autonomous Driving
arXiv 2025
-
GAIA-2
GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving
arXiv 2025
-
Model
Paper
Venue
Website
GitHub
RoboDreamer
RoboDreamer: Learning Compositional World Models for Robot Imagination
Arxiv 2024
BEHAVIOR
BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities
CoRL 2025
Habitat 2.0
Habitat 2.0: Training Home Assistants to Rearrange their Habitat
arXiv 2021
-
-
FMR
Foundation Models in Robotics: Applications, Challenges, and the Future
IJRR 2024
-
VLMPS
Visual Language Maps for Robot Navigation
ICRA 2023
Model
Paper
Venue
Website
GitHub
ILVE
Interactive Latent Variable Evolution for the Generation of Minecraft Structures
ICFDG 2021
-
-
ProcTHOR
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
NeurIPS 2022
WorldGPT
WorldGPT: Empowering LLM as Multimodal World Model
ACM MM 2024
-
Text2World
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation
arXiv 2025
Hunyuan3D 1.0
Hunyuan3D 1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation
arXiv 2025
Hunyuan3D 2.0
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation
arXiv 2025
Hunyuan3D 2.1
Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material
arXiv 2025
Hunyuan3D 2.5
Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details
arXiv 2025
Hunyuan-GameCraft
Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition
arXiv 2025
HunyuanWorld 1.0
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
arXiv 2025
MGVQ
MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-Group Quantization
arXiv 2025
Model
Paper
Venue
Website
GitHub
DynamicCity
DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes
ICLR 2025
UrbanScene3D
Capturing, Reconstructing, and Simulating: the UrbanScene3D Datase
ECCV 2022
GaussianCity
GaussianCity: Generative Gaussian Splatting for Unbounded 3D City Generation
CVPR 2025
UrbanWorld
UrbanWorld: An Urban World Model for 3D City Generation
Arxiv 2024
SceneDiffuser++
SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model
CVPR 2025
-
-