/Guided-Flow-Matching

Utilized attention incorporated UNet model for conditional image generation using Flow Matching with Conditional Optimal Transport Objective

Primary LanguageJupyter NotebookMIT LicenseMIT

Guided Conditional Image Generation with Conditional Flow Matching

Project Description

The project innovatively integrates Conditional Optimal Transport into an attention-based UNet model for both conditional and unconditional image generation tasks. Utilizing a Classifier Free Guidance (CFG) mechanism ensures a unified model's proficiency across tasks. Addressing the descriptive limitations of the CIFAR10 dataset, the BLIP2 FLAN T5 model is employed for image captioning, enhancing the conditioning process. The self and cross attention mechanism, incorporating timestep and tokenized text, facilitates conditioning. Extensive experimental analysis leads to an optimized architecture with a FID score of 105.54 for unconditional generation and CLIPScore/FID scores of 22.19/305.42 for conditional generation. The research highlights the model's potential, suggesting further improvements through architectural refinements and extended training.

Technical Skills

Python PyTorch Matplotlib Pandas NumPy Jupyter Notebook

Dependencies

Transformers
  !pip install transformers
PyTorch (Check CPU/GPU Compatibility)
  https://pytorch.org/get-started/locally/
Pandas
  !pip install pandas
NumPy
  !pip install numpy
Matplotlib
  !pip install matplotlib
TorchDiffEq
  !pip install torchdiffeq
Torchmetrics
  !pip install torchmetrics
Torchviz
  !pip install torchviz
Torch Fidelity
  !pip install torch-fidelity

Dataset Information

File Content

  • Caption_Generation.ipynb:

    • Utilizes the BLIP2 model to generate descriptive captions for images in the CIFAR dataset and stores the resulting dataset as a pickle file.
  • Cross_Validation.ipynb:

    • Implements code for cross-validation using a list of learning rates.
  • Flow_Matching_Training.ipynb:

    • Encompasses the entire training process, employing flow matching with a conditional optimal transport objective in conjunction with the proposed UNet model.
  • Flow_Inference.ipynb:

    • Contains code for generating images from uniformly sampled inputs and evaluates the FID and CLIPScore metrics for the trained models.
  • Text_Encoding.ipynb:

    • Utilizes the BLIP2 tokenizer to convert captions into tokens for subsequent use in the conditioning process.
  • UNet_Attn.ipynb:

    • Houses the proposed UNet model, a key component in the conditional and unconditional image generation tasks.
  • Docs

    • Project Report: Contains the documented project with the Problem Statement, Data Augmentation, Methodology, UNet Model, and the Results

How to run

  1. Dependency Installation:

    • Execute the command to install project dependencies necessary for proper functioning.
  2. Repository Cloning:

    • Clone the project repository to the local machine using the command:
      git clone https://github.com/Anshumaan-Chauhan02/Guided-Flow-Matching
      
  3. Caption Generation:

    • Run the Caption_Generation.ipynb notebook to generate a captioned dataset utilizing the BLIP2 model.
  4. Flow Matching Training:

    • Execute the Flow_Matching_Training.ipynb notebook to initiate the training process for the unconditional/conditional generation model.
  5. Model Evaluation and Inference:

    • Run the Flow_Inference.ipynb notebook for comprehensive model evaluation and generation of inferences.

Note:

  • Ensure to update the specified file paths in the notebooks with the appropriate local repository path.