/In-Context-LoRA

Official repository of In-Context LoRA for Diffusion Transformers

In-Context LoRA (IC-LoRA)

Welcome to the official repository of In-Context LoRA for Diffusion Transformers (arXiv Paper).

With IC-LoRA, you can fine-tune text-to-image models to generate image sets with customizable intrinsic relationships. You can also condition the generation on another image set, enabling task-agnostic adaptation to a wide range of applications.

teaser

Prompt: “This set of four images illustrates a young artist's creative process in a bright and inspiring studio; [IMAGE1] she stands before a large canvas, brush in hand, adding vibrant colors to a partially completed painting, [IMAGE2] she sits at a cluttered wooden table, sketching ideas in a notebook with various art supplies scattered around, [IMAGE3] she takes a moment to step back and observe her work, and [IMAGE4] she experiments with different textures by mixing paints directly on the palette, her focused expression showcasing her dedication to her craft.”

For more detailed information and examples, please read our arXiv Paper or visit our Project Page.

Getting Started

You can directly use the open-source AI-Toolkit to train IC-LoRA models. We have provided sample training data with a configuration file in this repo:

  • Configuration File: movie-shots.yml (place it in the config/ directory of AI-Toolkit)
  • Sample Training Data: movie-shots.zip (extract it to data/movie-shots)

After installing the necessary dependencies and setting up AI-Toolkit, you can start training by running:

python run.py config/movie-shots.yml

The training runs on a single GPU with at least 24GB of memory (adjust the resolution parameter in movie-shots.yml for different GPU memory limits). The training should complete in a few hours.

Prompt for Multi-Scene Image Captioning

As a reference, we provide an example prompt used to generate captions for multi-scene images:

Create a short description of this three-scene image featuring movie shots, beginning with the prefix [MOVIE-SHOTS] for the entire caption, followed by an overall summary of the image. Each scene detail should flow within the same sentence, with specific markers [SCENE-1], [SCENE-2], [SCENE-3], indicating the start of each scene’s description. Name the role(s) with random name(s) if necessary, and wrap the name(s) with "<" and ">". Ensure the entire description is cohesive, flows as one sentence, and remains within 512 words.

MODEL ZOO

We will continue to release In-Context LoRA models. Please stay tuned.

License

This repository uses FLUX as the base model. Users must comply with FLUX's license when using this code. Please refer to FLUX's License for more details.

DISCLAIMER: Please be aware that the training data provided in this repository may contain copyrighted material. The open-source data is intended for reference and educational purposes only. If you plan to use this data for commercial purposes, you are responsible for obtaining the necessary permissions and ensuring compliance with all applicable copyright laws and regulations.

Citation

If you find this work useful in your research, please consider citing:

@article{lhhuang2024iclora,
  title={In-Context LoRA for Diffusion Transformers},
  author={Huang, Lianghua and Wang, Wei and Wu, Zhi-Fan and Shi, Yupeng and Dou, Huanzhang and Liang, Chen and Feng, Yutong and Liu, Yu and Zhou, Jingren},
  journal={arXiv preprint arxiv:2410.23775},
  year={2024}
}