/readout_guidance

Official PyTorch Implementation for Readout Guidance, CVPR 2024

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

🔮 Readout Guidance: Learning Control from Diffusion Features

Grace Luo, Trevor Darrell, Oliver Wang, Dan B Goldman, Aleksander Holynski

This repository contains the PyTorch implementation of Readout Guidance: Learning Control from Diffusion Features.

This is not an officially supported Google product.

[Project Page][arXiv]

Releases

  • 🚀 2024/04/26: Additional code for pose estimation with readout heads in the readout_pose directory.
  • 🚀 2024/01/31: Initial codebase release with demos for drag-based manipulation and spatial control, as well as readout head training code. Includes weights for SDXL and SDv1-5 readout heads for appearance, correspondence, depth, edge, pose.

Setup

This code was tested with Python 3.8. To install the necessary packages, please run:

conda env create -f environment.yml
conda activate readout

Readout Heads

All model weights can be found on our HuggingFace page. To automatically download the weights run:

./download_weights.sh
Readout Head Type SDv1-5 SDXL
Pose Head download download
Depth Head download download
Edge Head download download
Correspondence Feature Head download download
Appearance Similarity Head download download

Demos

Note that the generation process is non-deterministic, even without Readout Guidance, so re-running the same cell or script with the exact same settings can yield better results.

  • demo_drag.ipynb: This demo walks through drag-based manipulation on either real images or generated images, where the user can also annotate the desired drags.
  • demo_spatial.ipynb: This demo walks through spatial control with the pose head on pose inputs derived from MSCOCO images.

Generation Scripts

You can also automatically generate many samples using the following scripts.

conda activate readout

# Run drag-based manipulation on samples in data/drag/real
python3 script_drag.py configs/drag_real.yaml

# Run spatial control on samples in data/spatial/pose
python3 script_spatial.py configs/spatial.yaml

Training Code

To train your own readout heads, please check out readout_training/README.md.

Citing

@inproceedings{luo2024readoutguidance,
    title={Readout Guidance: Learning Control from Diffusion Features},
    author={Grace Luo and Trevor Darrell and Oliver Wang and Dan B Goldman and Aleksander Holynski},
    journal={CVPR},
    year={2024}
}