DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation

Project Page | Paper

News

[2024/10/17] Repository Initialization.

Abstract

Closed-loop simulation is essential for advancing end-to-end autonomous driving systems. Contemporary sensor simulation methods, such as NeRF and 3DGS, rely predominantly on conditions closely aligned with training data distributions, which are largely confined to forward-driving scenarios. Consequently, these methods face limitations when rendering complex maneuvers (e.g., lane change, acceleration, deceleration).Recent advancements in autonomous-driving world models have demonstrated the potential to generate diverse driving videos. However, these approaches remain constrained to 2D video generation, inherently lacking the spatiotemporal coherence required to capture intricacies of dynamic driving environments. In this paper, we introduce DriveDreamer4D, which enhances 4D driving scene representation leveraging world model priors. Specifically, we utilize the world model as a data machine to synthesize novel trajectory videos based on real-world driving data. Notably, we explicitly leverage structured conditions to control the spatial-temporal consistency of foreground and background elements, thus the generated data adheres closely to traffic constraints. To our knowledge, DriveDreamer4D is the first to utilize video generation models for improving 4D reconstruction in driving scenarios. Experimental results reveal that DriveDreamer4D significantly enhances generation quality under novel trajectory views, achieving a relative improvement in FID by 24.5%, 39.0%, and 10.5% compared to PVG, S3Gaussian, and Deformable-GS. Moreover, DriveDreamer4D markedly enhances the spatiotemporal coherence of driving agents, which is verified by a comprehensive user study and the relative increases of 19.7%, 12.7%, and 11.3% in the NTA-IoU metric.

DriveDreamer4D Framework

Scenario Selection

The eight scenarios selected are as follows: 005, 018, 027, 065, 081, 096, 121 and 164 in the validation set of Waymo.

Rendering Results in Lane Change Novel Trajectory

027_pvg_change.mp4

018_combine.mp4

164_df.mp4

Comparisons of novel trajectory renderings during lane change scenarios. The left column shows PVG, S³Gaussian, and Deformable-GS, while the right column shows DriveDreamer4D-PVG, DriveDreamer4D-S³Gaussian, and DriveDreamer4D-Deformable-GS.

Rendering Results in Speed Change Novel Trajectory

065_pvg.mp4

121_s3.mp4

096_df.mp4

Comparisons of novel trajectory renderings during speed change scenarios. The left column shows PVG, S³Gaussian, and Deformable-GS, while the right column shows DriveDreamer4D-PVG, DriveDreamer4D-S³Gaussian, and DriveDreamer4D-Deformable-GS.

Bibtex

If this work is helpful for your research, please consider citing the following BibTeX entry.

@inproceedings{zhao2024drive,
    title={DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation}, 
    author={Guosheng Zhao and Chaojun Ni and Xiaofeng Wang and Zheng Zhu and Xueyang Zhang and Yida Wang and Guan Huang and Xinze Chen and Boyuan Wang and Youyi Zhang and Wenjun Mei and Xingang Wang},
    journal={arxiv arXiv preprint arXiv:2410.13571},
    year={2024},
}

GigaAI-research/DriveDreamer4D