/taim-gan

TAIM-GAN: Text Assisted Image Manipulation GAN. Course Project for ML701

Primary LanguageJupyter NotebookMIT LicenseMIT

TAIM-GAN

Text-Assisted Image Manipulation - GAN

paper

Description

In this project we explore the idea of changing the images according to the textual captions through generative networks

How to use

There's two primary ways in which you can use our project: use our publicly deployed version of TAIM-GAN or deploy it locally on your machine. Also, feel free to fork and modify the source code for latter reseach

Hugging Face

Visit the live demo of our project on Hugging Face! It has some interesting examples:

image image image

Docker

To set up the project locally through docker, just do the following:

  1. Clone our repository using git clone https://github.com/Dmmc123/taim-gan.git
  2. Deploy the Gradio web application using docker-compose up

Datasets

We used three datasets for training, evaluation, and deployment of TAIM-GAN:

  1. COCO
  2. CUB
  3. UTKFace with generated captions from BLIP

Train/evaluate

Example of training script:

train.py --data-dir data --split train --num-capt 5 --batch-size 16 --num-workers 8

Example of evaluation script:

compute_metrics.py --data-dir data --split test --num-capt 10 --batch-size 32 --num-workers 4

For additonal information about possible values of arguments and their meaning, please type train.py --help or compute_metrics.py --help

Project Organization

Project uses template from cookiecutter for data science projects


├── notebooks             <- Jupyter Notebooks for interim results of coding tasks
│
├── references            <- Main papers we used as a source
│
├── src                   <- Source files of the project
|   | 
│   ├── data              <- Code for preparing the datasets and vectorizing texts
|   |   |
│   |   ├── stubs         <- Folder for examples in the Gradio application
|   |   |
│   |   ├── collate.py    <- Function for preprocessing output of datasets for inference
│   |   ├── datasets.py   <- Code for processing the raw data from datasets
│   |   └── tokenizer.py  <- API for tokenizing text captions
|   |
│   ├── models            <- Code for model components and inference utility functions
|   |   |
|   |   ├── modules       <- All the atomic modules that constitute TAIM-GAN
|   |   |
│   |   ├── losses.py     <- Loss functions for Discriminator and Generator
│   |   └── utils.py      <- Helper functions for loading/dumping weights of models and saving plots
|   |
|   └── config.py         <- Constants used throughout the project
|
├── tests                 <- Code for integration and unit testing
|   │
|   ├── integration       <- Tests for checking the whole flow of TAIM-GAN
|   │ 
|   └── unit              <- Scripts for testing the mudules in atomic way
|        
├── app.py                <- Code for Gradio web application
├── compute_metrics.py    <- Code for computing Inceptions Scores on datasets
├── docker-compose.yaml   <- Docker config for deploying the Gradio web app locally
├── requirements.txt      <- List of all dependencies of the project
└── train.py              <- Code for training the model

References

The project is primarily based on LWGAN research. Here is what we did differently:

  • Removed the redundant modules from the source code
  • Refactored from scratch the existing code by providing type hints for the most part of code-base
  • Covered most parts of code with unit and intergrations tests
  • Removed the deprecated functionality and upgraded the project according the latest version of PyTorch 1.12.1
  • Collected new dataset with captions for facial pictures and finetuned TAIM-GAN with it

Also here are some other researches we used as an additional reference: