We present Continuous 3D Words, a way to encode fine-grained attributes like illumination, non-rigid shape changes, and camera parameters as special tokens for text-to-image generation. Our model is built upon Stable Diffusion 2.1 and Lower Rank Adaptation (LoRA).
# The model is tested with diffusers 0.16.1
pip install -r requirements.txt
# then you can use the notebook for demo.
Please download the checkpoints from here, then create a ckpts/
directory and put the checkpoints in.
Note that each task requires two checkpoints. *sd.safetensors
is the LoRA checkpoint used for the Stable Diffusion, whereas *mlp.pt
is the MLP checkpoint for continuous 3D Words.
- [Feb 14, 2024] Demos for illumination and non-rigid running is added 🔥. The training scripts will be added soon 🚧.
All codes (unless otherwise specified) complies to the Adobe Research License.
Code in lora_diffusion/
is adapted from the LoRA implementation from cloneofsimo which can be found here. Please comply to their LICENSE accordingly.
If you find this work helpful in your research/applications, please cite using the following BibTeX:
@article{cheng2023C3D,
title={Learning Continuous 3D Words for Text-to-Image Generation},
author={Cheng, Ta-Ying and Gadelha, Matheus and Groueix, Thibault and Fisher, Matthew and Mech, Radomir and Markham, Andrew and Trigoni, Niki},
booktitle={arXiv},
year={2024}
}