Hanshu Yan1, Xingchao Liu2, Jiachun Pan3, Jun Hao Liew1, Qiang Liu2, Jiashi Feng1
1ByteDance 2UT Austin 3NUS
📚 Tech Report | 🎨 Project Page | 🤗 Models
Contributions are welcomed!
- [2024/04/25] We released the PeRFlow accelerated SDXL. Find the model here 🤗: PeRFlow-SDXL-DreamShaper and PeRFlow-SDXL-base. Training scripts are also included at
./scripts
. - [2024/03/11] A demo of PeRFlow-T2I (including refiner) is availble at Replicate Space. We thank individual contributor Chenxi.
- [2024/03/08] Text-to-3D via combining PeRFlow-T2I with TripoSR. Try the online Gradio demo 🤗 here.
- [2024/03/05] PeRFlow+Wonder3D gives one-step multiview generation! See here.
- [2024/03/05] Training scripts are released. Run with
bash scripts/train.sh
- [2024/02/29] We released the PeRFlow accelerated version of Stable Diffusion v2.1.
- [2024/02/19] We released the PeRFlow acceleration module for Stable Diffusion v1.5, supporting various SD-v1.5 pipelines. Find inference scripts at
scripts
.
Rectified Flow is a promising way for accelerating pre-trained diffusion models. However, the generation quality of prior fast flow-based models on Stable Diffusion (such as InstaFlow) is unsatisfactory.
In this work, we did several improvements to the original reflow pipeline to significantly boost the performance of flow-based fast SD.
Our new model learns a piecewise linear probability flow which can efficiently generate high-quality images in just 4 steps, termed piecewise rectified flow (PeRFlow).
Moreover, we found the difference of model weights,
Specifically, PeRFlow has several features:
-
Fast Generation
: PeRFlow can generate high-fidelity images in just 4 steps. The images generated from PeRFlow are more diverse than other fast-sampling models (such as LCM). Moreover, as PeRFlow is a continuous probability flow, it supports 8-step, 16-step, or even higher number of sampling steps to monotonically increase the generation quality. -
Efficient Training
: Fine-tuning PeRFlow based on SD 1.5 converges in just 4,000 training iterations (with a batch size of 1024). In comparison, previous fast flow-based text-to-image model, InstaFlow, requires 25,000 training iteration with the same batch size in fine-tuning. Besides, PeRFlow does not require heavy data generation for reflow. -
Compatible with SD Workflows
: PeRFlow works with various stylized LORAs and generation/editing pipelines of the pretrained SD model. As a plug-and-play module,$\Delta W$ can be directly combined with other conditional generation pipelines, such as ControlNet, IP-Adaptor, multi-view generation. -
Classifier-Free Guidance
: PeRFlow is fully compatible with classifier-free guidance and supports negative prompts, which are important for pushing the generation quality to even higher level. Empirically, the CFG scale is similar to the original diffusion model.
Generate high-quality images (512x512) with only 4 steps!
By plugging PeRFlow
One also can use PeRFlow-Refiner separately to enhance texture and details of low-res blurry images. Here are two examples: on the left, from x64 to x1024, and on the right, from x256 to x1024.
One-step image-to-multiview is enabled by plugging PeRFlow
Plug PeRFlow
Plug PeRFlow
Editing with PeRFlow+Prompt-to-Prompt
Please refer to the project page for more results, including the comparison to LCM.
Install running dependencies with: bash env/install.sh
. Training and evaluation scripts are provided in scripts
.
PeRFlow acceleration yields the delta_weights
import torch, torchvision
from diffusers import StableDiffusionPipeline, UNet2DConditionModel
from src.utils_perflow import merge_delta_weights_into_unet
from src.scheduler_perflow import PeRFlowScheduler
delta_weights = UNet2DConditionModel.from_pretrained("hansyan/perflow-sd15-delta-weights", torch_dtype=torch.float16, variant="v0-1",).state_dict()
pipe = StableDiffusionPipeline.from_pretrained("Lykon/dreamshaper-8", torch_dtype=torch.float16,)
pipe = merge_delta_weights_into_unet(pipe, delta_weights)
pipe.scheduler = PeRFlowScheduler.from_config(pipe.scheduler.config, prediction_type="diff_eps", num_time_windows=4)
pipe.to("cuda", torch.float16)
For easy try, we also provide complete accelerated weights (already merged with PeRFlow
from diffusers import StableDiffusionXLPipeline
pipe = StableDiffusionXLPipeline.from_pretrained("hansyan/perflow-sdxl-dreamshaper", torch_dtype=torch.float16, use_safetensors=True, variant="v0-fix")
from src.scheduler_perflow import PeRFlowScheduler
pipe.scheduler = PeRFlowScheduler.from_config(pipe.scheduler.config, prediction_type="ddim_eps", num_time_windows=4)
pipe.to("cuda", torch.float16)
prompts_list = [
["photorealistic, uhd, high resolution, high quality, highly detailed; masterpiece, A closeup face photo of girl, wearing a rain coat, in the street, heavy rain, bokeh,",
"distorted, blur, low-quality, haze, out of focus",],
["photorealistic, uhd, high resolution, high quality, highly detailed; masterpiece, A beautiful cat bask in the sun",
"distorted, blur, low-quality, haze, out of focus",],
]
for i, prompts in enumerate(prompts_list):
setup_seed(42)
prompt, neg_prompt = prompts[0], prompts[1]
samples = pipe(
prompt = [prompt] * 2,
negative_prompt = [neg_prompt] * 2,
height = 1024,
width = 1024,
num_inference_steps = 6,
guidance_scale = 2.0,
output_type = 'pt',
).images
torchvision.utils.save_image(torchvision.utils.make_grid(samples, nrow = 2), f'tmp_{i}.png')
import torch, torchvision
from diffusers.pipelines.stable_diffusion import StableDiffusionPipeline
from src.scheduler_perflow import PeRFlowScheduler
pipe = StableDiffusionPipeline.from_pretrained("hansyan/perflow-sd15-dreamshaper", torch_dtype=torch.float16)
pipe.scheduler = PeRFlowScheduler.from_config(pipe.scheduler.config, prediction_type="diff_eps", num_time_windows=4)
pipe.to("cuda", torch.float16)
prompts_list = ["A man with brown skin, a beard, and dark eyes", "A colorful bird standing on the tree, open beak",]
for i, prompt in enumerate(prompts_list):
generator = torch.Generator("cuda").manual_seed(1024)
prompt = "RAW photo, 8k uhd, dslr, high quality, film grain, highly detailed, masterpiece; " + prompt
neg_prompt = "distorted, blur, smooth, low-quality, warm, haze, over-saturated, high-contrast, out of focus, dark"
samples = pipe(
prompt = [prompt],
negative_prompt = [neg_prompt],
height = 512,
width = 512,
num_inference_steps = 8,
guidance_scale = 7.5,
generator = generator,
output_type = 'pt',
).images
torchvision.utils.save_image(torchvision.utils.make_grid(samples, nrow=4), f"tmp_{i}.png")
Scripts for text-to-image and controlnet (depth/edge/pose/tile) are included in scripts
. You can try efficient image enhancement via controlnet-tile models. We also provide fast text-to-multiview gradio interface in app/Wonder3D
based on Wonder3D. Install diffusers 0.19.3
and run python Wonder3D/sd15_t2mv_gradio.py
.
Rectified Flows proposes to contruct flow-based generative models via linear interpolation, and the trajectories of the learned flow can be straightened with a special operation called reflow. However, the reflow procedure requires generating a synthetic dataset by simulating the entire pre-trained probability flow. This consumes a huge amount of storage and time, as well as induces large numerical errors in samples, making it unfavorable for training large-scale foundation models. To address this limitation, we propose piecewise rectified flow. By dividing the pre-trained probability flows into multiple time windows and straightening the intermediate probability flows inside each window with reflow, we yield a piecewise linear probability flow that can be sampled within very few steps. This divide-and-conquer strategy successfully avoids the cumbersome simulation of the whole ODE trajectory, thereby allowing us to perform the piecewise reflow operation online in training.
As shown in the figure, the pre-trained probability flow (which can be transformed from a pre-trained diffusion model) maps random noise distribution
Quantitative Results: We train a PeRFlow model on LAION-aesthetic-v2 data to accelerate SD-v1.5. We compare the FID with respect to three datasets, including: (1) a subset of 30K images from LAION, (2) a set of 30K images generated from SD-v1.5 with the JourneyDB prompts, (3) the validation set of MS-COCO2014. For all these datasets, we generate 30K images with different models using the corresponding text prompts. The results are presented in the following table. PeRFlow has lower FIDs in all the three comparisons according to the numerical results.
LAION5B-30k | SD-v1.5 | COCO2014-30k | ||||
---|---|---|---|---|---|---|
FID | 4-step | 8-step | 4-step | 8-step | 4-step | 8-step |
PeRFlow | 9.74 | 8.62 | 9.46 | 5.05 | 11.31 | 14.16 |
LCM | 15.38 | 19.21 | 15.63 | 21.19 | 23.49 | 29.63 |
@article{yan_perflow_2024,
title={PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator},
author={Yan, Hanshu and Liu, Xingchao and Pan, Jiachun and Liew, Jun Hao and Liu, Qiang and Feng, Jiashi},
year={2024},
url={http://arxiv.org/abs/2405.07510}
}
We provide several related links here:
-
The official Rectified Flow github repo (https://github.com/gnobitab/RectifiedFlow)
-
The official InstaFlow github repo (https://github.com/gnobitab/InstaFlow)
Our training and evaluation scripts are implemented based on the Diffusers and Accelerate libraries. We use several high-quality finetuned versions of Stable Diffusion for model evaluation, including DreamShaper, RealisticVision, LandscapeRealistic, ArchitectureExterior, DisneyCartoon.
Xingchao Liu wishes to express his genuine gratitude to Nat Friedman and the Andromeda cluster for providing free GPU grants during this research.