/RIVAL

[NeurIPS 2023 Spotlight] Real-World Image Variation by Aligning Diffusion Inversion Chain

Primary LanguagePythonApache License 2.0Apache-2.0

RIVAL

[NeurIPS 2023 Spotlight] Official Implementation of paper Real-World Image Variation by Aligning Diffusion Inversion Chain [PDF] [ arXiv ] [ Project Page ]

Project MileStones

  • [20231028] Code release for the image variations and text-to-image
  • [20231030] Code release for ControlNet inference, image editing
  • [20231031] Code release for other applications (like +inpainting), user manual
  • [202311xx] Code release for SDXL, and other possible applications

Applications and User Manual

We provide several examples with five applications: variations, T2I, editing, inpainting, and ControlNet.

Environment setting:

Please raise an issue/PR if you have problems in env setting.

conda create -n rival python=3.9.16
conda activate rival
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
conda install xformers -c xformers

The usage of hyper-params

All applications have a config file for inference. The following shows a brief explanation of some key parameters.

{
    "self_attn":
    {
        "atten_frames": 2,
        "t_align": 600 # [0-1000], smaller means closer to the original image (semantically).
    },
    "inference":
    {
        "invert_step": 50,
        "ddim_step": 50,
        "cfg": 7,
        "is_null_prompt": true, # whether use the empty prompt "" in inversion.
        "t_early": 600 # [0-1000], smaller means closer to the original image (low-level color distribution).
    }
}

In test python file (e.g., rival/test_variation_sdv1.py):

  • --inf_config: Inference config file. default="configs/rival_variation.json"
  • --img_config: Data config file. default="assets/images/configs_variation.json"
  • --inner_round: How many images do you want to generate per reference. default=1
  • --exp_folder: Output folder. default="out/variation_exps"
  • --pretrained_model_path: SD model path. default="runwayml/stable-diffusion-v1-5"
  • --is_half: Whether use fp16. default=False
  • --is_editing: If set True, we do not permute inverted latent. default=False
  • --editing_early_steps: For t > step, do normal inference in self-attention. default=1000

Image Variations

With a reference image, RIVAL generates images with the same semantic contents and style, without any optimization.

bash scripts/rival_variation_test.sh

Editing-based applications

Image Editing

Users can modify the editing_early_steps in this script to control the editing strength.

bash scripts/rival_editing_test.sh

Customized Concept Editing

With RIVAL, we can customize both object concept and style concept that is hard be describe.

bash scripts/rival_dreambooth_test.sh

Example-Based Inpainting

Please note that its application scope is indeed limited (as shown in the paper, the example can only come from itself).

bash scripts/rival_inpainting_test.sh

Generation-based applications

Text-Driven Image Generation

bash scripts/rival_t2i_test.sh

Generation with ControlNet

The config example is given in assets/images/configs_controlnet.json. You may enable more modalities by editing the Python script.

bash scripts/rival_controlnet_test.sh

Motivation and Method

BibTeX

@article{zhang2023realworld,
  title={Real-World Image Variation by Aligning Diffusion Inversion Chain}, 
  author={Yuechen Zhang and Jinbo Xing and Eric Lo and Jiaya Jia},
  journal={arXiv preprint arXiv:2305.18729},
  year={2023},
}