/LinFusion-diffusion

Official PyTorch and Diffusers Implementation of "LinFusion: 1 GPU, 1 Minute, 16K Image"

Primary LanguagePythonApache License 2.0Apache-2.0

LinFusion

arXiv Home Page

LinFusion: 1 GPU, 1 Minute, 16K Image
Songhua Liu, Weuhao Yu, Zhenxiong Tan, and Xinchao Wang
Learning and Vision Lab, National University of Singapore

🔥News

[2024/09/08] We release codes for 16K image generation here!

[2024/09/05] Gradio demo for SD-v1.5 is released! Text-to-image, image-to-image, and IP-Adapter are supported currently.

Supported Models

  1. Yuanshi/LinFusion-1-5: For Stable Diffusion 1.5 and its variants.

Quick Start

  • If you have not, install PyTorch and diffusers.

  • Clone this repo to your project directory:

    git clone https://github.com/Huage001/LinFusion.git
  • You only need two lines!

    from diffusers import AutoPipelineForText2Image
    import torch
    
    + from src.linfusion import LinFusion
    
    sd_repo = "Lykon/dreamshaper-8"
    
    pipeline = AutoPipelineForText2Image.from_pretrained(
        sd_repo, torch_dtype=torch.float16, variant="fp16"
    ).to(torch.device("cuda"))
    
    + linfusion = LinFusion.construct_for(pipeline)
    
    image = pipeline(
        "An astronaut floating in space. Beautiful view of the stars and the universe in the background.",
        generator=torch.manual_seed(123)
    ).images[0]

    LinFusion.construct_for(pipeline) will return a LinFusion model that matches the pipeline's structure. And this LinFusion model will automatically mount to the pipeline's forward function.

  • examples/basic_usage.ipynb shows a basic text-to-image example.

Ultrahigh-Resolution Generation

  • From the perspective of efficiency, our method supports high-resolution generation such as 16K images. Nevertheless, directly applying diffusion models trained on low resolutions for higher-resolution generation can result in content distortion and duplication. To tackle this challenge, we apply techniques in SDEdit. The basic idea is to generate a low-resolution result at first, based on which we gradually upscale the image. Please refer to examples/ultra_text2image_w_sdedit.ipynb for an example. Note that 16K generation is only currently available for 80G GPUs. We will try to relax this constraint by implementing tiling strategies.
  • We are working on integrating LinFusion with more advanced approaches that are dedicated on high-resolution extension!

ToDo

  • Stable Diffusion 1.5 support.
  • Stable Diffusion 2.1 support.
  • Stable Diffusion XL support.
  • Release training code for LinFusion.
  • Release evaluation code for LinFusion.

Citation

If you finds this repo is helpful, please consider cite:

@article{liu2024linfusion,
  title     = {LinFusion: 1 GPU, 1 Minute, 16K Image},
  author    = {Liu, Songhua and Yu, Weihao and Tan, Zhenxiong and Wang, Xinchao},
  year      = {2024},
  eprint    = {2409.02097},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}