[Paper]   [Project Page]   [🤗 Comic Generation Demo ]
For results sharing and discussion: https://discord.gg/hSJZ35fV
For codebase and deployment-related discussion: https://discord.gg/2HFUHT9p
Official implementation of StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation.
StoryDiffusion can create a magic story by generating consistent images and videos. Our work mainly has two parts:
- Consistent self-attention for character-consistent image generation over long-range sequences. It is hot-pluggable and compatible with all SD1.5 and SDXL-based image diffusion models. For the current implementation, the user needs to provide at least 3 text prompts for the consistent self-attention module. We recommend at least 5 - 6 text prompts for better layout arrangement.
- Motion predictor for long-range video generation, which predicts motion between Condition Images in a compressed image semantic space, achieving larger motion prediction.
"In a quaint old house, a man discovers a newspaper article about a hidden treasure in the nearby forest. Driven by dreams of fortune, he drives alone along a deserted road leading into the dark woods. As night falls, a sudden encounter with a wild tiger heightens the peril. Startled and fearful, he flees deeper into the forest, where he stumbles upon an ancient, secluded treasure house. Inside, surrounded by untold riches and no sign of the modern world, his laughter echoes, a sound of pure joy and triumph over his earlier fears."
- At home, reading a newspaper - #At home, the newspaper reveals there's a hidden treasure in the nearby forest.
- On the road, approaching the forest - #He drives to the forest, fueled by dreams of discovering the hidden treasure.
- Encounter in the forest at night - #A tiger suddenly appears on the path, its eyes glowing in the dark forest at night.
- Frightened in the forest at night - #Startled and very frightened, he opens his mouth in shock, standing frozen in the forest at night.
- Running through the forest at night - #In a panic, he runs very fast through the dark forest, trying to escape the looming shadows and eerie sounds.
- Discovery of the house in the forest at night - #As he stumbles through the darkness, he suddenly discovers the treasure house hidden amongst the trees!
- Python >= 3.8 (Recommend to use Anaconda or Miniconda)
- PyTorch >= 2.0.0
conda create --name storydiffusion python=3.10
conda activate storydiffusion
pip install -U pip
# Install requirements
pip install -r requirements.txt
Currently, we provide two ways for you to generate comics.
You can open the Comic_Generation.ipynb
and run the code.
Run the following command:
python gradio_app_sdxl_specific_id.py
We provide a low GPU Memory cost version, it was test on a machine with 24GB GPU-memory(Tesla A10) and 30GB RAM, and expcted work well with >20 G GPU-memory.
python gradio_app_sdxl_specific_id_low_vram.py
@article{Zhou2024storydiffusion,
title={StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation},
author={Zhou, Yupeng and Zhou, Daquan and Cheng, Ming-Ming and Feng, Jiashi and Hou, Qibin},
year={2024}
}