/Deep-Generative-Models-Homework

This repository features implementations of various deep generative models, including VAEs, GANs, NFs, DDPMs, and EBMs, using datasets like MNIST and CIFAR-10.

Primary LanguageJupyter NotebookCreative Commons Zero v1.0 UniversalCC0-1.0

Deep Generative Models Course - Homework Solutions

Introduction

This repository contains my solutions to six Deep Generative Models assignments, each exploring a different topic in generative models and neural networks. The notebooks feature implementations of key models such as CLIP, LSTMs, VAEs, GANs, Flow-Based Models, DDPMs, and EBMs. These solutions focus on real-world datasets, including MNIST, CelebA, and a custom captcha dataset, showcasing a variety of deep learning techniques.

Table of Contents

  1. Homework 1: OpenAI CLIP Model (Food 101)
  2. Homework 2: Autoregressive Generative Models and Variational Autoencoders
  3. Homework 3: GANs and Flow-Based Models
  4. Homework 4: Denoising Diffusion and Energy-Based Models

Homework 1: OpenAI CLIP Model

In this notebook, I evaluate the OpenAI CLIP model. The focus is on understanding how CLIP associates text and images in a shared embedding space.

Visualization of embeddings for fine-grained labels (represented with smaller data points and an alpha value of 0.4) alongside the original labels (depicted with larger data points). Each class is assigned a distinct color for differentiation, and a legend is included for clarity.

Homework 2: Autoregressive Generative Models and Variational Autoencoders

I implemented an LSTM-based autoregressive generative model to predict stock prices, using historical stock data. The model captures complex temporal dependencies by factorizing the joint probability of the time series using the chain rule of probability.

Time series visualization of stock market data, with actual prices shown in blue and predicted prices in red, highlighting the model's forecasting performance.

Stock price trends showing the 10, 20, and 50-day moving averages alongside adjusted closing prices, highlighting overall price trends.

In this part, I implemented a Variational Autoencoder (VAE) to generate new handwritten digits using the MNIST dataset. The VAE model learns a latent space representation and generates new digits by sampling from the learned distribution.

Average images for each digit (0-9) generated by the Variational Autoencoder, illustrating the typical features of handwritten digits.

Homework 3: GANs and Flow-Based Models

This section includes implementations of Conditional GANs and Wasserstein GANs. The goal was to improve stability during training while generating high-quality images through adversarial learning.

Generated images of handwritten digits (0-9) from the Basic GAN model trained on the MNIST dataset, demonstrating the model's ability to capture the diversity of digit styles.

Conditional GAN-generated images of handwritten digits, showcasing the model's capability to produce digits based on specified labels from the MNIST dataset.

Generated images from the Wasserstein GAN model trained on the CIFAR-10 dataset, illustrating improved visual quality and diversity compared to traditional GANs.

I applied normalizing flows to enhance a VAE for more realistic image generation on the CelebA dataset. Flow-based models enable efficient and exact sampling, which improves the expressiveness of the latent space representation.

Generated images from the Normalizing Flow model.

Homework 4: Denoising Diffusion and Energy-Based Models

I implemented a Denoising Diffusion Probabilistic Model (DDPM) to generate new captcha images by progressively adding noise and learning to reverse this process. The captcha dataset consists of RGB images with corresponding text labels.

Generated captcha images from the Denoising Diffusion Probabilistic Model (DDPM) trained on the captcha dataset, showcasing the model's ability to create diverse and realistic samples without conditioning.

Conditional generated captcha images from the Denoising Diffusion Probabilistic Model (DDPM), illustrating the model's capability to produce targeted outputs based on specified text conditions.

In this part, I implemented an Energy-Based Model (EBM) using contrastive divergence on the MNIST dataset. The model learns to generate new samples by optimizing energy functions based on training data.

Generated digit samples from the Energy-Based Model (EBM) trained on the MNIST dataset using contrastive divergence, showcasing the model's ability to create new images.