Unofficial implementation of Design Booster

This is an unofficial implementation of Design Booster using d🧨ffusers by mkshing. In that paper, DesignBooster is trained on a large dataset such as LAION. But I am wondering if it's applicable to small dataset as well such as Dreambooth and here it is!

I am happy to get any feedbacks!

Diffusion models are able to generate photorealistic images in arbitrary scenes. However, when applying diffusion models to image translation, there exists a trade-off between maintaining spatial structure and high-quality content. Besides, existing methods are mainly based on test-time optimization or fine-tuning model for each input image, which are extremely time-consuming for practical applications. To address these issues, we propose a new approach for flexible image translation by learning a layout-aware image condition together with a text condition. Specifically, our method co-encodes images and text into a new domain during the training phase. In the inference stage, we can choose images/text or both as the conditions for each time step, which gives users more flexible control over layout and content. Experimental comparisons of our method with state-of-the-art methods demonstrate our model performs best in both style image translation and semantic image translation and took the shortest time.

Installation

You need x-transformers by lucidrains additionally.

$ pip install -r requirements.txt

Train Design Booster

accelerate launch train_designbooster.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --revision="fp16" \
  --mixed_precision="fp16" \
  --resolution=512 \
  --train_batch_size=1 \
  --use_8bit_adam \
  --instance_data_dir=$INSTANCE_DATA_DIR \
  --class_data_dir=$CLASS_DATA_DIR \
  --output_dir="/content/designbooster" \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --instance_prompt="a portrait of sks" \
  --class_prompt="a portrait" \
  --learning_rate=1e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=$NUM_CLASS_IMAGES \
  --max_train_steps=4000

Inference

Please see txt2img.py

Modules

DesignBoosterModel