latent-diffusion-inpainting

This repository is based on CompVis/latent-diffusion, with modifications for classifier conditioning and architecture improvements.

Since the original codebase is very big, complex and lack of documentation to fine-tune the original autoencoder and diffusion model.

It is extremely diffcult to fine tune existing pre trained model to produce good result.

Iusses in the original repository

How to finetune inpainting? #151

how to train Inpainting model using our own datasets? #280

Details about training inpainting model #9

how to train inpainting model with my own datasets #265

Training inpainting model on new datasets #298

Reproduction problem while training inpainting model #159

Hardware requirement

Without pretraining, it would take 8 V100 GPUs to produce satisfactory result.

With finetuning, 1 3090 is enough for transfer learning to medical images( in my case )

This repository made the fine tuning setup and inference easy by fixing some of the bug in the original repo.

Major Changes

Load and Fine tune autoencoder (Very important for transfer learning )
Load and fine tune latent diffusion model
Combine trained autoencoder with latent diffusion model
Inference example for both model
Simplified data and mask loading
Fixed some bug when training inpainting model

The original inpaint is to remove object from the image:

However, we can turn the model into createing object!

Result

Original Image

One polyp

Two polyp

Requirements

If you already have the ldm environment, please skip it

A suitable conda environment named ldm can be created and activated with:


conda env create -f ldm/environment.yaml
conda activate ldm

Data Loader

From my experiment for medical images, it is better to produce a square mask instead of using polygon mask.

If you want to change it, feel free to modify the /ldm/ldm/data/PIL_data.py to change the data loading format.

All the dataloader used in training are in that .py file and it has simplified.

Usage

For most of the normal image, you DO NOT need to finetune the autoencoder.

You will want to funetune the autoencoder ONLY when your data is very different from the pretrained dataset, for exmaple endoscopic images.

So, you can skip part 1 and 2.

1. Finetune the autoencoder

Since the autoencoder used for the pre-trained inpainting is vq-f4-noattn, we have to stick with it.

First, prepare the images and masks with the same format as in kvasir-seg folder (we DO NOT need any mask to finetune autoencoder)

Second, modify the data path in config.yaml( it should be in ldm/models/first_stage_models/vq-f4-noattn/config.yaml)

Then, run the following command

CUDA_VISIBLE_DEVICES=0 python main.py --base ldm/models/first_stage_models/vq-f4-noattn/config.yaml --resume ldm/models/first_stage_models/vq-f4-noattn/model.ckpt --stage 0 -t --gpus 0,

The model is trained with 50% of the original image and 50% of randomly masked image

2. Comebine the autoencoder with the diffusion model

Please refer to the combine.ipynb

3. Finetune Latent diffusion model

Note that, the mask in here is in square mask, you can disable draw_rectangle_over_mask function in the /ldm/ldm/data/PIL_data.py to use original mask.

First, download the pre trained weight and prepare the images with the same format as in kvasir-seg folder

Download the pre-trained weights

wget -O models/ldm/inpainting_big/last.ckpt https://heibox.uni-heidelberg.de/f/4d9ac7ea40c64582b7c9/?dl=1

Second, modify the data path in config.yaml( it should be in ldm/models/ldm/inpainting_big/config.yaml )