This series began with a perplexing body-flipped video ... see more in this article.
a person swimming in ocean, high quality, 4K resolution. | |
---|---|
- Added support for Image-to-Video (I2V) generation in FIFO-Diffusion.
- Improving Visual Consistency in the long video generation
- Seeding the initial latent frame as the image embedding.
- Use Weighted Q-caches in Spatio-Temporal Attention.
- Extending the Latent Uniformly before the diagonal denoising.
- More background about 3D U-net and Spatio-Temporal Attention in my blog
Check my article for more details.
Check my article for more details.
Check my article for more details.
FIFO-Diffusion | FIFO+ Uniform Latents |
---|---|
![]() |
![]() |
"a bicycle accelerating to gain speed, high quality, 4K resolution." | |
![]() |
![]() |
"a car stuck in traffic during rush hour, high quality, 4K resolution." |
conda create --name fifoplus python=3.10.14
conda activate fifoplus
pip install -r requirements.txt
Model | Resolution | Checkpoint | Config |
---|---|---|---|
VideoCrafter2 (Text2Video) | 320x512 | Hugging Face | Link |
VideoCrafter1 (Image2Video) | 320x512 | Hugging Face | Link |
Directory structure:
. FIFO-Diffusion_public
├──configs
│ ├── inference_i2v_512_v1.0.yaml
│ └── inference_t2v_512_v2.0.yaml
├──videocrafter_models
│ ├── base_512_v2
│ │ └── model.ckpt
│ └── Image2Video_512
.. └── model.ckpt
For t2v
and t2v_seed
, the txt filw should look like
{prompt1}
{prompt2}
...
Example:
a person swimming in ocean, high quality, 4K resolution.
a person giving a presentation to a room full of colleagues, high quality, 4K resolution.
For i2v
{image_path_1};{prompt1}
{image_path_22};{prompt2}
...
Example:
/data/vbench2/a large wave crashes over a rocky cliff.jpg;a large wave crashes over a rocky cliff, high quality, 4K resolution.
/data/vbench2/A teddy bear is climbing over a wooden fence.jpg;A teddy bear is climbing over a wooden fence, high quality, 4K resolution.
- Main options:
i2v
t2v
t2v_seed
: Seeding the initial latent frame
- Sub options:
TTqcache_attn1
: Enable Q-cachesunilatent
: Extending the Latent Uniformly
This will create a folder name {experiment}
under the main directory and a {experiment}.gif
(or mp4).
. FIFO-Diffusion_public
├──results
.. └── videocraft_v2_fifo
├── latents # this stores the clean latent from base model
└── random_noise
└── {prompt}
└──{experiment}
python3 videocrafter_main.py \\
--config configs/inference_t2v_512_v2.0.yaml \\
--ckpt_path videocrafter_models/base_512_v2/model.ckpt \\
--prompt_file prompts/vbench_t2v_subject_consistency_debug.csv \\
--mode t2v_TTqcache_attn1_unilatent \\
--save_frames \\
--experiment t2v_TTqcache_attn1_unilatent
python3 videocrafter_main.py \\
--config configs/inference_i2v_512_v1.0.yaml \\
--ckpt_path videocrafter_models/Image2Video_512/model.ckpt \\
--prompt_file prompts/vbench_t2v_cohe_fromi2v.csv \\
--mode t2v_seed_TTqcache \\
--save_frames \\
--experiment t2v_seed_TTqcache
This repo is a fork of FIFO-Diffusion, using VideoCrafter as the base model. The ideas are also inspired by ConsiStory: Training-Free Consistent Text-to-Image Generation and Cross-Image Attention for Zero-Shot Appearance Transfer. Be sure to check out and cite their original publications. And I am open to any discussions on this work!