/Mediffusion

Primary LanguageJupyter Notebook

Mediffusion

Diffusion models have significantly impacted the realm of image generation. In a bid to reduce the technical complexity, we aim to lower the entry barrier for the medical community. To achieve this, we have introduced mediffusion, a user-friendly diffusion package that can be effortlessly tailored to address medical problems with less than 20 lines of code. We have utilized various codebases, including guided diffusion and LDM, enhancing their robustness for medical use cases. We plan to update this package regularly. Embracing the spirit of open science, we invite you to consider sharing a demo notebook of your work should you choose to utilize this package.

Happy Coding!

Setup and Installation

Step 1: Create a Conda Environment

If you haven't installed Conda yet, you can download it from here. After installing, create a new Conda environment by running:

conda create --name mediffusion python=3.10

Activate the environment:

conda activate mediffusion

Step 2: Install PyTorch

Install PyTorch specifically for CUDA 11.8 by running:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Step 3: Install The Package

You can install the latest version from github using:

pip install mediffusion

This will install all the necessary packages.

Training

1. Hyperparameters

Before starting the training, it is recommended that you set up some global constants and environment variables:

os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
os.environ['WANDB_API_KEY'] = "WANDB-API-KEY"

TOTAL_IMAGE_SEEN = 40e6
BATCH_SIZE = 36
NUM_DEVICES = 2 # number of devices in CUDA_VISIBLE_DEVICES
TRAIN_ITERATIONS = int(TOTAL_IMAGE_SEEN / (BATCH_SIZE * NUM_DEVICES))

2. Preparing Data

To prepare the data, you need to create a dataset where each element is a dictionary. The dictionary should have the key "img" and may also contain additional keys like "cls" and "concat" depending on the type of condition. One way to do this is by using MONAI. Below is a sample code snippet:

import monai as mn

train_data_dicts = [
    {"img": "./image1.dcm", "cls": 2},
    {"img": "./image2.dcm", "cls": 0}
]

valid_data_dicts = [
    {"img": "./image9.dcm", "cls": 1}
]

transforms = mn.transforms.Compose([
    mn.transforms.LoadImageD(keys="img"),
    mn.transforms.SelectItemsD(keys=["img","cls"]),
    mn.transforms.ScaleIntensityD(keys=["img"], minv=-1, maxv=1),
    mn.transforms.ToTensorD(keys=["img","cls"], dtype=torch.float, track_meta=False),
])

train_ds = mn.data.Dataset(data=train_data_dicts, transform=transforms) 
valid_ds = mn.data.Dataset(data=valid_data_dicts, transform=transforms)
train_sampler = torch.utils.data.RandomSampler(train_ds, replacement=True, num_samples=TOTAL_IMAGE_SEEN)

At the end of this step, you should have train_ds, val_ds and train_sampler.

3. Configuring Model

Configuration Fields Explanation

Below is a table that provides descriptions for each element in the configuration file:

Section Field Description
diffusion timesteps The number of timesteps in the diffusion process
schedule_name The name of the schedule (e.g., "cosine")
enforce_zero_terminal_snr Whether to enforce zero terminal SNR (True/False)
schedule_params Parameters related to the diffusion schedule
-- beta_start Starting value for beta in the schedule
-- beta_end Ending value for beta in the schedule
-- cosine_s Parameter for cosine schedule
timestep_respacing Can be a list of respacings. For example, with 200 steps, [10,20] means in the first 100, get 10 samples and in the next 100, get 20 samples.
mean_type Type of mean model (e.g., "VELOCITY")
var_type Type of variance model (e.g., "LEARNED_RANGE")
loss_type The type of loss to use (e.g., "MSE")
optimizer lr Learning rate
type The type of optimizer to use
validation classifier_cond_scale Classifier guidance scale for validation logging.
protocol Inference protocol for logging validation results
log_original Whether to log the original validation data (True/False)
log_concat Whether to log the concatenated images (True/False)
log_cls_indices Whether to log the entire cls vector (default value of -1), or specefic indices from the cls vector (user should provide a list of desired cls indices)
model input_size The input size of the model. Can be an integer for square and cube images or a list of integers for specific axes, like [64, 64, 32]
dims Number of dimensions, 2 or 3 for 2D and 3D images
attention_resolutions List of resolutions for attention layers
channel_mult List of multipliers for each layer's channels
dropout Dropout rate
in_channels Number of input channels (image channels + concat channels)
out_channels Number of output channels (image channels or image channels * 2 if learning the variance)
model_channels Number of convolution channels in the model
num_head_channels Number of attention head channels
num_heads Number of attention heads
num_heads_upsample Number of attention head after upsampling
num_res_blocks List of the number of residual blocks for each layer
resblock_updown Whether to use residual blocks for down/up sampling (True/False)
use_checkpoint Whether to use checkpointing (True/False)
use_new_attention_order Whether to use the new attention ordering (True/False)
use_scale_shift_norm Whether to use scale-shift normalization (True/False)
scale_skip_connection Whether to scaleskip connections (True/False)
num_classes Number of classes for conditioning
concat_channels Number of concatenatong channels for conditioning (for super-resolution or inpainting)
guidance_drop_prob Drop probability for the classifier free guidance scale training

For sample configurations, please checkout the sample_configs directory.

Note: If a field is left out of the config file, the default value is infered based on this file: mediffusion/default_config/default.yaml.

Instantiating Model

You can instantiate the model using the configuration file and dataset as follows:

from mediffusion import DiffusionModule

model = DiffusionModule(
    "./config.yaml",
    train_ds=train_ds,
    val_ds=valid_ds,
    dl_workers=2,
    train_sampler=train_sampler,
    batch_size=32,               # train batch size
    val_batch_size=16            # validation batch size (recommended size is half of batch_size)
)

4. Setting Up Trainer

You can set up the trainer using the Trainer class:

from mediffusion import Trainer

trainer = Trainer(
    max_steps=TRAIN_ITERATIONS,
    val_check_interval=5000,
    root_directory="./outputs", # where to save the weights and logs
    precision="16-mixed",       # mixed precision training
    devices=-1,                 # use all the devices in CUDA_VISIBLE_DEVICES
    nodes=1,
    wandb_project="Your_Project_Name",
    logger_instance="Your_Logger_Instance",
)

5. Training the Model

Finally, to train your model, you simply call:

trainer.fit(model)

Prediction

1. Loading the Model

First, import the DiffusionModule class and load the pre-trained model checkpoint. The model is then moved to the CUDA device and set to inference mode. Additionally, you may choose to enable half-precision for better performance:

from mediffusion import DiffusionModule

model = DiffusionModule("./config.yaml")
model.load_ckpt("./outputs/pl/last.ckpt", ema=True)
model.cuda().half()
model.eval()

2. Preparing Input

Prepare the noise and model keyword arguments. Here, "cls" specifies the class condition and is set to 0:

import torch

noise = torch.randn(1, 1, 256, 256)
model_kwargs = {"cls": torch.tensor([0]).cuda().half()}

Note: You can use other keys like concat and/or cls_embed. To find out more, look at the tutorials directory.

3. Making Predictions

To make a prediction, use the predict method from the DiffusionModule class:

img = model.predict(
    noise, 
    model_kwargs=model_kwargs, 
    classifier_cond_scale=4, 
    inference_protocol="DDIM100"
)
  • noise: The input noise tensor
  • model_kwargs: A dictionary containing additional model configurations (e.g., class conditions)
  • classifier_cond_scale: The scale used for the classifier free guidance condition during inference
  • inference_protocol: The inference protocol to be used (e.g., "DDIM100")

The img is the generated output based on the model's inference (C:H:W(:D)). To save the image, you need to transpose it first, due to the different axis conventions.

Note: The model currently supports the following solvers: DDPM,DDIM,IDDIM(for inverse diffusion), and PLMS. As an example, "PLMS100" means using the PLMS solver for 100 steps.

Tutorials

For more hands-on tutorials on how to effectively use this package, please check the tutorials folder in the GitHub repository. These tutorials provide step-by-step instructions, Colab notebooks, and explanations to help you get started with the software.

File Name Description Notebook Link
01_2d_ddpm Getting started with training a simple 2D class-conditioned DDPM. 📓
02_2d_inpainting Image inpainting with 2D diffusion model (repaint method) 📓

TO-DO

The following features and improvements are currently on our development roadmap:

  • Cross-attention
  • DPM-Solver
  • VAE for LDM

We are actively working on these features and they will be available in future releases.

Issues and Contributions

Issues

If you encounter any issues while using this package, we encourage you to open an issue in the GitHub repository. Your feedback helps us to improve the software and resolve any bugs or limitations.

Contributions

Contributions to the codebase are always welcome. If you have a feature request, bugfix, or any other contribution, feel free to submit a pull request.

Development Opportunities

If you're interested in actively participating in the development of this package, please send us a Direct Message (DM). We're always open to collaboration and would be delighted to have you on board.

Citation

If you find this work useful, please consider citing the parent project:

@article{KHOSRAVI2023107832,
    title = {Few-shot biomedical image segmentation using diffusion models: Beyond image generation},
    journal = {Computer Methods and Programs in Biomedicine},
    volume = {242},
    pages = {107832},
    year = {2023},
    issn = {0169-2607},
    doi = {https://doi.org/10.1016/j.cmpb.2023.107832},
    url = {https://www.sciencedirect.com/science/article/pii/S0169260723004984},
    author = {Bardia Khosravi and Pouria Rouzrokh and John P. Mickley and Shahriar Faghani and Kellen Mulford and Linjun Yang and A. Noelle Larson and Benjamin M. Howe and Bradley J. Erickson and Michael J. Taunton and Cody C. Wyles},
}