/bird-gan

Research for text-to-image synthesis via modified auxiliary classifier GANs. Incremental modification of model architecture for improved results, fully documented.

Primary LanguageJupyter Notebook

BirdGAN

Progressive implementations of GAN architectures applied to the CUB200 dataset to generate unique images conditioned on attributes and caption embeddings.

Prerequisites

  • The CUB200 dataset
  • Captions for the CUB200 dataset
  • Pretrained BERT-large (uncased) model for embedding captions to 1024D vectors
  • bert-as-service for utilizing the pretrained BERT model
  • A python notebook environment
  • Python 3.7
    • TensorFlow 2.0 or greater
    • Pandas
    • OpenCV3

Implementation Categories (ordered old → new)

  1. Vanilla DCGAN
  2. Multilabel ACGAN
  3. Multilabel ACGAN with a split discriminator (for finer tuning)
  4. Multilabel ACGAN with a split discriminator with BERT captions
  5. Multilabel ACGAN with a split discriminator with BERT captions V2

Sample Generations (ordered old → new)

Vanilla DCGAN:

Vanilla DCGAN

Multilabel ACGAN:

Multilabel ACGAN

Multilabel ACGAN w/split Discriminator:

Multilabel ACGAN w/split Discriminator

Multilabel ACGAN w/split Discriminator and Captions:

Multilabel ACGAN w/split Discriminator and Captions

Multilabel ACGAN w/split Discriminator and Captions V2

Multilabel ACGAN w/split Discriminator and Captions V2