Chuning Zhu1, Raymond Yu1, Siyuan Feng2, Benjamin Burchfiel2, Paarth Shah2, Abhishek Gupta1
1University of Washington 2Toyota Research Institute
This repository provides a PyTorch implementation of Unified World Model (UWM). UWM combines action diffusion and video diffusion to enable scalable pretraining on large, heterogeneous robotics datasets.
configs: Configuration files for pretraining and finetuning experiments.datasets: Dataset wrappers for DROID, Robomimic, and LIBERO. We standardize all datasets using compressed Zarr buffers.environments: Interface wrappers for Robomimic and LIBERO environments.experiments: Training and evaluation scripts.models: Model definitions for UWM and baselines.scripts: Bash scripts for running DROID experiments.
Install the package via
pip install -e .
Note: if you encounter issues using tensorflow-dataset with DROID, consider installing tensorflow-dataset from source.
To run a Robomimic single-task experiment,
- Install the Robomimic dataset.
- Update
hdf5_pathandbuffer_pathin the config (e.g.,configs/dataset/robomimic_cap_ph.yaml). - Run:
python experiments/uwm/train_robomimic.py --config_name train_uwm_robomimic.yaml dataset=robomimic_can_ph exp_id=singletask
This command will generate a Zarr compressed buffer at the buffer_path specified in the config file.
The LIBERO experiments share most infrastructure with the Robomimic experiments.
To pretrain a UWM on LIBERO-90,
- Install the LIBERO dataset.
- Update
hdf5_pathandbuffer_pathinconfigs/dataset/libero_90.yaml. - Run:
python experiments/uwm/train_robomimic.py --config_name train_uwm_robomimic.yaml dataset=libero_90 exp_id=pretrain
To finetune a pretrained UWM on a downstream LIBERO task (e.g., Book-Caddy),
- Update
hdf5_pathandbuffer_pathinconfigs/dataset/libero_book_caddy.yaml. - Run:
python experiments/uwm/train_robomimic.py --config-name finetune_uwm_robomimic.yaml dataset=libero_book_caddy exp_id=finetune pretrain_checkpoint_path="logdir/uwm/libero_90/pretrain/0/models.pt"
We release the pretrained LIBERO-90 checkpoint here. You can download and directly finetune from this checkpoint.
We provide shell scripts for DROID pretraining / cotraining / finetuning experiments in the scripts directory. Each script runs a dataset conversion pipeline to create a Zarr buffer for the corresponding DROID TFDS dataset and then launches training.
To launch a DROID pretraining experiment,
- Install the DROID dataset
- Update
DATA_DIRandBUFFER_PATHinscripts/launch_droid_pretrain.sh - Run:
source scripts/launch_droid_pretrain.sh
To launch a video cotraining experiment,
- Install the DROID dataset
- Update
DATA_DIR,ROBOT_BUFFER_PATH, andVIDEO_BUFFER_PATHinscripts/launch_droid_cotrain.sh - Run:
source scripts/launch_droid_cotrain.sh
To fineune a pretrained model to a downstream task,
- Collect demonstrations using the DROID interface
- Convert them into a TFDS dataset (via this pipeline)
- Modify and run:
source scripts/launch_droid_finetune.sh
We release the pretrained and cotrained DROID UWM checkpoints here. You can download and directly finetune from these checkpoints.
If you find this code useful, please cite:
@inproceedings{zhu2025uwm,
author = {Zhu, Chuning and Yu, Raymond and Feng, Siyuan and Burchfiel, Benjamin and Shah, Paarth and Gupta, Abhishek},
title = {Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets},
booktitle = {Proceedings of Robotics: Science and Systems (RSS)},
year = {2025},
}