/piecewise-rectified-flow

PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator (NeurIPS 2024)

Primary LanguageJupyter NotebookBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator

Hanshu Yan1, Xingchao Liu2, Jiachun Pan3, Jun Hao Liew1, Qiang Liu2, Jiashi Feng1
1ByteDance   2UT Austin   3NUS

📚 Tech Report |  🎨 Project Page |  🤗 Models

Contributions are welcomed!

🔥 News

  • [2024/04/25] We released the PeRFlow accelerated SDXL. Find the model here 🤗: PeRFlow-SDXL-DreamShaper and PeRFlow-SDXL-base. Training scripts are also included at ./scripts.
  • [2024/03/11] A demo of PeRFlow-T2I (including refiner) is availble at Replicate Space. We thank individual contributor Chenxi.
  • [2024/03/08] Text-to-3D via combining PeRFlow-T2I with TripoSR. Try the online Gradio demo 🤗 here.
  • [2024/03/05] PeRFlow+Wonder3D gives one-step multiview generation! See here.
  • [2024/03/05] Training scripts are released. Run with bash scripts/train.sh
  • [2024/02/29] We released the PeRFlow accelerated version of Stable Diffusion v2.1.
  • [2024/02/19] We released the PeRFlow acceleration module for Stable Diffusion v1.5, supporting various SD-v1.5 pipelines. Find inference scripts at scripts.

Introduction

Rectified Flow is a promising way for accelerating pre-trained diffusion models. However, the generation quality of prior fast flow-based models on Stable Diffusion (such as InstaFlow) is unsatisfactory. In this work, we did several improvements to the original reflow pipeline to significantly boost the performance of flow-based fast SD. Our new model learns a piecewise linear probability flow which can efficiently generate high-quality images in just 4 steps, termed piecewise rectified flow (PeRFlow). Moreover, we found the difference of model weights, ${\Delta}W = W_{\text{PeRFlow}} - W_{\text{SD}}$, can be used as a plug-and-play accelerator module on a wide-range of SD-based models.

Specifically, PeRFlow has several features:

  • Fast Generation : PeRFlow can generate high-fidelity images in just 4 steps. The images generated from PeRFlow are more diverse than other fast-sampling models (such as LCM). Moreover, as PeRFlow is a continuous probability flow, it supports 8-step, 16-step, or even higher number of sampling steps to monotonically increase the generation quality.
  • Efficient Training: Fine-tuning PeRFlow based on SD 1.5 converges in just 4,000 training iterations (with a batch size of 1024). In comparison, previous fast flow-based text-to-image model, InstaFlow, requires 25,000 training iteration with the same batch size in fine-tuning. Besides, PeRFlow does not require heavy data generation for reflow.
  • Compatible with SD Workflows: PeRFlow works with various stylized LORAs and generation/editing pipelines of the pretrained SD model. As a plug-and-play module, $\Delta W$ can be directly combined with other conditional generation pipelines, such as ControlNet, IP-Adaptor, multi-view generation.
  • Classifier-Free Guidance : PeRFlow is fully compatible with classifier-free guidance and supports negative prompts, which are important for pushing the generation quality to even higher level. Empirically, the CFG scale is similar to the original diffusion model.

Applications

Fast image generation via PeRFlow-T2I

Generate high-quality images (512x512) with only 4 steps!

Image enhancement via PeRFlow-Refiner

By plugging PeRFlow ${\Delta}W$ into the ControlNet-Tile pipeline, we obtain PeRFlow-Refiner to upsample/refine images. We can use PeRFlow-T2I and PeRFlow-Refiner together to generate astonishing x1024 images with lightweight SD-v1.5 backbones. We use 4-step PeRFlow-T2I to generate x512 images first, then upsample them to x1024 with 4-step PeRFlow-Refiner.

One also can use PeRFlow-Refiner separately to enhance texture and details of low-res blurry images. Here are two examples: on the left, from x64 to x1024, and on the right, from x256 to x1024.

Efficient multiview generation via PeRFlow-Wonder3D

One-step image-to-multiview is enabled by plugging PeRFlow $\Delta W$ into pre-trained Wonder3D. We can use PeRFlow-T2I and PeRFlow-Wonder3D together to generate multiview normal maps and textures from text prompts instantly. Here shows "a dog with glasses and cap", "a bird", and "a vintage car".

Accelerate other SD pipelines via PeRFlow

Plug PeRFlow ${\Delta}W$ into controlnets of SD-v1.5.

Plug PeRFlow ${\Delta}W$ into IP-adaptor.

Editing with PeRFlow+Prompt-to-Prompt

Please refer to the project page for more results, including the comparison to LCM.

Demo Code

Install running dependencies with: bash env/install.sh. Training and evaluation scripts are provided in scripts.

PeRFlow acceleration yields the delta_weights ${\Delta}W$ corresponding to the pretrained diffusion models. The complete weights of UNet for inference are computed by $W = W_{\text{SD}} + {\Delta}W$, where $W_{\text{SD}}$ are the weights of base models, such as the vanilla or customized (DreamShaper, RealisticVision, etc.) SD models. We provide the delta_weights for SD-v1.5 at PeRFlow🤗. You can download the delta-weights and fuse them into your own SD pipelines.

import torch, torchvision
from diffusers import StableDiffusionPipeline, UNet2DConditionModel
from src.utils_perflow import merge_delta_weights_into_unet
from src.scheduler_perflow import PeRFlowScheduler
delta_weights = UNet2DConditionModel.from_pretrained("hansyan/perflow-sd15-delta-weights", torch_dtype=torch.float16, variant="v0-1",).state_dict()
pipe = StableDiffusionPipeline.from_pretrained("Lykon/dreamshaper-8", torch_dtype=torch.float16,)
pipe = merge_delta_weights_into_unet(pipe, delta_weights)
pipe.scheduler = PeRFlowScheduler.from_config(pipe.scheduler.config, prediction_type="diff_eps", num_time_windows=4)
pipe.to("cuda", torch.float16)

For easy try, we also provide complete accelerated weights (already merged with PeRFlow ${\Delta}W$) of several popular diffusion models , including SD-v1.5 and SDXL. Load the model, change the scheduler, then enjoy the fast generation.

from diffusers import StableDiffusionXLPipeline
pipe = StableDiffusionXLPipeline.from_pretrained("hansyan/perflow-sdxl-dreamshaper", torch_dtype=torch.float16, use_safetensors=True, variant="v0-fix")
from src.scheduler_perflow import PeRFlowScheduler
pipe.scheduler = PeRFlowScheduler.from_config(pipe.scheduler.config, prediction_type="ddim_eps", num_time_windows=4)
pipe.to("cuda", torch.float16)

prompts_list = [
    ["photorealistic, uhd, high resolution, high quality, highly detailed; masterpiece, A closeup face photo of girl, wearing a rain coat, in the street, heavy rain, bokeh,",
        "distorted, blur, low-quality, haze, out of focus",],
    ["photorealistic, uhd, high resolution, high quality, highly detailed; masterpiece, A beautiful cat bask in the sun",
        "distorted, blur, low-quality, haze, out of focus",],
]

for i, prompts in enumerate(prompts_list):
    setup_seed(42)
    prompt, neg_prompt = prompts[0], prompts[1]
    samples = pipe(
        prompt              = [prompt] * 2, 
        negative_prompt     = [neg_prompt] * 2,
        height              = 1024,
        width               = 1024,
        num_inference_steps = 6, 
        guidance_scale      = 2.0,
        output_type         = 'pt',
    ).images
    torchvision.utils.save_image(torchvision.utils.make_grid(samples, nrow = 2), f'tmp_{i}.png')
import torch, torchvision
from diffusers.pipelines.stable_diffusion import StableDiffusionPipeline
from src.scheduler_perflow import PeRFlowScheduler
pipe = StableDiffusionPipeline.from_pretrained("hansyan/perflow-sd15-dreamshaper", torch_dtype=torch.float16)
pipe.scheduler = PeRFlowScheduler.from_config(pipe.scheduler.config, prediction_type="diff_eps", num_time_windows=4)
pipe.to("cuda", torch.float16)

prompts_list = ["A man with brown skin, a beard, and dark eyes", "A colorful bird standing on the tree, open beak",]
for i, prompt in enumerate(prompts_list):
    generator = torch.Generator("cuda").manual_seed(1024)
    prompt = "RAW photo, 8k uhd, dslr, high quality, film grain, highly detailed, masterpiece; " + prompt
    neg_prompt = "distorted, blur, smooth, low-quality, warm, haze, over-saturated, high-contrast, out of focus, dark"
    samples = pipe(
        prompt              = [prompt], 
        negative_prompt     = [neg_prompt],
        height              = 512,
        width               = 512,
        num_inference_steps = 8, 
        guidance_scale      = 7.5,
        generator           = generator,
        output_type         = 'pt',
    ).images
    torchvision.utils.save_image(torchvision.utils.make_grid(samples, nrow=4), f"tmp_{i}.png")

Scripts for text-to-image and controlnet (depth/edge/pose/tile) are included in scripts. You can try efficient image enhancement via controlnet-tile models. We also provide fast text-to-multiview gradio interface in app/Wonder3D based on Wonder3D. Install diffusers 0.19.3 and run python Wonder3D/sd15_t2mv_gradio.py.

Method: Accelerating Diffusion Models with Piecewise Rectified Flows

Rectified Flows proposes to contruct flow-based generative models via linear interpolation, and the trajectories of the learned flow can be straightened with a special operation called reflow. However, the reflow procedure requires generating a synthetic dataset by simulating the entire pre-trained probability flow. This consumes a huge amount of storage and time, as well as induces large numerical errors in samples, making it unfavorable for training large-scale foundation models. To address this limitation, we propose piecewise rectified flow. By dividing the pre-trained probability flows into multiple time windows and straightening the intermediate probability flows inside each window with reflow, we yield a piecewise linear probability flow that can be sampled within very few steps. This divide-and-conquer strategy successfully avoids the cumbersome simulation of the whole ODE trajectory, thereby allowing us to perform the piecewise reflow operation online in training.

As shown in the figure, the pre-trained probability flow (which can be transformed from a pre-trained diffusion model) maps random noise distribution $\pi_1$, to the data distribution $\pi_0$. It requires many steps to sample from the curved flow with ODE solvers. Instead, PeRFlow divides the sampling trajectories into multiple segments (two as an example here), and straightens each segment with the reflow operation. A well-trained PeRFlow can generate high-quality images in very few steps because of its piecewise linear nature.

Quantitative Results: We train a PeRFlow model on LAION-aesthetic-v2 data to accelerate SD-v1.5. We compare the FID with respect to three datasets, including: (1) a subset of 30K images from LAION, (2) a set of 30K images generated from SD-v1.5 with the JourneyDB prompts, (3) the validation set of MS-COCO2014. For all these datasets, we generate 30K images with different models using the corresponding text prompts. The results are presented in the following table. PeRFlow has lower FIDs in all the three comparisons according to the numerical results.

LAION5B-30k SD-v1.5 COCO2014-30k
FID 4-step 8-step 4-step 8-step 4-step 8-step
PeRFlow 9.74 8.62 9.46 5.05 11.31 14.16
LCM 15.38 19.21 15.63 21.19 23.49 29.63

Citation

@article{yan_perflow_2024,
  title={PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator},
  author={Yan, Hanshu and Liu, Xingchao and Pan, Jiachun and Liew, Jun Hao and Liu, Qiang and Feng, Jiashi},
  year={2024},
  url={http://arxiv.org/abs/2405.07510}
}

Related Materials

We provide several related links here:

Acknowledgements

Our training and evaluation scripts are implemented based on the Diffusers and Accelerate libraries. We use several high-quality finetuned versions of Stable Diffusion for model evaluation, including DreamShaper, RealisticVision, LandscapeRealistic, ArchitectureExterior, DisneyCartoon.

Xingchao Liu wishes to express his genuine gratitude to Nat Friedman and the Andromeda cluster for providing free GPU grants during this research.