/unified-world-model

Unfied World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets

Primary LanguagePython

Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets

Chuning Zhu1, Raymond Yu1, Siyuan Feng2, Benjamin Burchfiel2, Paarth Shah2, Abhishek Gupta1

1University of Washington 2Toyota Research Institute

This repository provides a PyTorch implementation of Unified World Model (UWM). UWM combines action diffusion and video diffusion to enable scalable pretraining on large, heterogeneous robotics datasets.

Code structure

  • configs: Configuration files for pretraining and finetuning experiments.
  • datasets: Dataset wrappers for DROID, Robomimic, and LIBERO. We standardize all datasets using compressed Zarr buffers.
  • environments: Interface wrappers for Robomimic and LIBERO environments.
  • experiments: Training and evaluation scripts.
  • models: Model definitions for UWM and baselines.
  • scripts: Bash scripts for running DROID experiments.

Setup

Install the package via

pip install -e .

Note: if you encounter issues using tensorflow-dataset with DROID, consider installing tensorflow-dataset from source.

Robomimic Experiments

To run a Robomimic single-task experiment,

  1. Install the Robomimic dataset.
  2. Update hdf5_path and buffer_path in the config (e.g., configs/dataset/robomimic_cap_ph.yaml).
  3. Run:
python experiments/uwm/train_robomimic.py --config_name train_uwm_robomimic.yaml dataset=robomimic_can_ph exp_id=singletask

This command will generate a Zarr compressed buffer at the buffer_path specified in the config file.

LIBERO Experiments

The LIBERO experiments share most infrastructure with the Robomimic experiments.

Pretraining

To pretrain a UWM on LIBERO-90,

  1. Install the LIBERO dataset.
  2. Update hdf5_path and buffer_path in configs/dataset/libero_90.yaml.
  3. Run:
python experiments/uwm/train_robomimic.py --config_name train_uwm_robomimic.yaml dataset=libero_90 exp_id=pretrain

Finetuning

To finetune a pretrained UWM on a downstream LIBERO task (e.g., Book-Caddy),

  1. Update hdf5_path and buffer_path in configs/dataset/libero_book_caddy.yaml.
  2. Run:
python experiments/uwm/train_robomimic.py --config-name finetune_uwm_robomimic.yaml dataset=libero_book_caddy exp_id=finetune pretrain_checkpoint_path="logdir/uwm/libero_90/pretrain/0/models.pt"

We release the pretrained LIBERO-90 checkpoint here. You can download and directly finetune from this checkpoint.

DROID Experiments

We provide shell scripts for DROID pretraining / cotraining / finetuning experiments in the scripts directory. Each script runs a dataset conversion pipeline to create a Zarr buffer for the corresponding DROID TFDS dataset and then launches training.

Pretraining

To launch a DROID pretraining experiment,

  1. Install the DROID dataset
  2. Update DATA_DIR and BUFFER_PATH in scripts/launch_droid_pretrain.sh
  3. Run:
source scripts/launch_droid_pretrain.sh

Cotraining

To launch a video cotraining experiment,

  1. Install the DROID dataset
  2. Update DATA_DIR, ROBOT_BUFFER_PATH, and VIDEO_BUFFER_PATH in scripts/launch_droid_cotrain.sh
  3. Run:
source scripts/launch_droid_cotrain.sh

Finetuning

To fineune a pretrained model to a downstream task,

  1. Collect demonstrations using the DROID interface
  2. Convert them into a TFDS dataset (via this pipeline)
  3. Modify and run:
source scripts/launch_droid_finetune.sh

We release the pretrained and cotrained DROID UWM checkpoints here. You can download and directly finetune from these checkpoints.

Bibtex

If you find this code useful, please cite:

@inproceedings{zhu2025uwm,
    author    = {Zhu, Chuning and Yu, Raymond and Feng, Siyuan and Burchfiel, Benjamin and Shah, Paarth and Gupta, Abhishek},
    title     = {Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets},
    booktitle = {Proceedings of Robotics: Science and Systems (RSS)},
    year      = {2025},
}