/FIFO-Diffusion_public

Official implementation of FIFO-Diffusion: Generating Infinite Videos from Text without Training

Primary LanguagePython

Improving Visual Consistency for Long Video Generation (FIFO-Diffusion + VideoCrafter)

This series began with a perplexing body-flipped video ... see more in this article.

a person swimming in ocean, high quality, 4K resolution.

Features

Seeding the initial latent frame

Check my article for more details.

FIFO-Diffusion FIFO+
Initial Seeding
FIFO+
Initial Seeding
(Autoregressive)
"a bicycle accelerating to gain speed, high quality, 4K resolution."
"a bicycle slowing down to stop, high quality, 4K resolution."

Weighted Q-caches

Check my article for more details.

FIFO-Diffusion FIFO+Q-caches
"a person swimming in ocean, high quality, 4K resolution."
"a boat sailing smoothly on a calm lake, high quality, 4K resolution."
"a bicycle leaning against a tree, high quality, 4K resolution."

Extending the Latent Uniformly

Check my article for more details.

FIFO-Diffusion FIFO+
Uniform Latents
"a bicycle accelerating to gain speed, high quality, 4K resolution."
"a car stuck in traffic during rush hour, high quality, 4K resolution."

Installation

conda create --name fifoplus python=3.10.14
conda activate fifoplus
pip install -r requirements.txt

Downloading the Checkopoints

Model Resolution Checkpoint Config
VideoCrafter2 (Text2Video) 320x512 Hugging Face Link
VideoCrafter1 (Image2Video) 320x512 Hugging Face Link

Directory structure:

. FIFO-Diffusion_public
    ├──configs
    │     ├── inference_i2v_512_v1.0.yaml
    │     └── inference_t2v_512_v2.0.yaml
    ├──videocrafter_models
    │     ├── base_512_v2
    │     │        └── model.ckpt
    │     └── Image2Video_512
    ..             └── model.ckpt

Usage

Prompts files

For t2v and t2v_seed, the txt filw should look like

{prompt1}
{prompt2}
...

Example:

a person swimming in ocean, high quality, 4K resolution.
a person giving a presentation to a room full of colleagues, high quality, 4K resolution.

For i2v

{image_path_1};{prompt1}
{image_path_22};{prompt2}
...

Example:

/data/vbench2/a large wave crashes over a rocky cliff.jpg;a large wave crashes over a rocky cliff, high quality, 4K resolution.
/data/vbench2/A teddy bear is climbing over a wooden fence.jpg;A teddy bear is climbing over a wooden fence, high quality, 4K resolution.

Argument --mode {main_option}{sub_option}

  • Main options:
    • i2v
    • t2v
    • t2v_seed: Seeding the initial latent frame
  • Sub options:
    • TTqcache_attn1: Enable Q-caches
    • unilatent: Extending the Latent Uniformly

Argument --experiment {experiment}

This will create a folder name {experiment} under the main directory and a {experiment}.gif (or mp4).

. FIFO-Diffusion_public
    ├──results
    ..  └── videocraft_v2_fifo
              ├── latents # this stores the clean latent from base model
              └── random_noise
                    └── {prompt}
                            └──{experiment}

Inference command for main option t2v

python3 videocrafter_main.py \\
--config configs/inference_t2v_512_v2.0.yaml \\
--ckpt_path videocrafter_models/base_512_v2/model.ckpt \\
--prompt_file prompts/vbench_t2v_subject_consistency_debug.csv \\
--mode t2v_TTqcache_attn1_unilatent \\
--save_frames \\
--experiment t2v_TTqcache_attn1_unilatent

Inference command for main option i2v and t2v_seed

python3 videocrafter_main.py \\
--config configs/inference_i2v_512_v1.0.yaml \\
--ckpt_path videocrafter_models/Image2Video_512/model.ckpt \\
--prompt_file prompts/vbench_t2v_cohe_fromi2v.csv \\
--mode t2v_seed_TTqcache \\
--save_frames \\
--experiment t2v_seed_TTqcache

Acknowledgements

This repo is a fork of FIFO-Diffusion, using VideoCrafter as the base model. The ideas are also inspired by ConsiStory: Training-Free Consistent Text-to-Image Generation and Cross-Image Attention for Zero-Shot Appearance Transfer. Be sure to check out and cite their original publications. And I am open to any discussions on this work!