In this Repository, we test the performance of different backbones on Latent Diffusion task for Image generation.
- UNet architecture proposed in the original Paper.
- Wide-ResNet Based Backbone Network
- EfficientNetV3 based Backbone Network
- VisionTransformer Based Backnoe Network
- Image size : We train on 256x256 sized images, but we use a pretrained network to bring the size of the embeddings down to 1x1x512. The diffusion is carried out in this Latent space.
- Schedule : We use a linear noise decay schedule
- Dataset : We train the network on : a. MNIST dataset. b. Subset of the WikiArt dataset.
- Cite the original Paper
- Cite datasets
- Cite reference code implementations