/EDGAN

EDGAN: StackGAN with Embedding Distance Training

Primary LanguagePythonMIT LicenseMIT

EDGAN

This repository modifies the original StackGAN code from github.

Dataset

use MSCOCO data set

get data set and preprocessed model

  • Download MSCOCO dataset and annotations including captions and instances
  • Download pretrained char-CNN-RNN embedding of MSCOCO.
  • misc/preprocess_mscoco.py preprocess the image in to different sizes for selected supercategory ,write them into tfrecords file along with the corresponding caption embedding.

New features

Data input pipline

  • use mscoco python API
  • dataloader that load tfrecords from mscoco
  • image augumentation including cropping, flipping, and standarlization (when downsample the image, use INTER_AREA method)
  • sampling from multiple caption embeddings, visualize embedding distributions
  • negative example (use inner product of embedding captions, see method CLSGAN)
  • filter out selective images based on classes and their areas

Modification of GAN network

  • enlarge capacity of generator network, adding 3 residual blocks.
  • change relu to leaky relu
  • option to no batch norm in discriminator
  • increase or reduce discriminator final dimension

Multiple training methods of GAN

  • Option to trian with vanilla GAN
  • Option to train with WGAN (excluding weight clipping for batchnorm)
  • Option to train with LSGAN
  • Option to train with CLSGAN, continous least square GAN that estimates the inner products of embeddings between right caption embeddings and wrong caption embeddings.
  • Option to train with BGAN (not implemented yet)

Classification Transfering from Imagenet to MSCOCO (for future 3 stage GAN)

  • Label each image in MSCOCO with multiple labels for objects that have area larger than the threshold
  • Transfer resnet from Caffe to Tensorflow
  • Train resnet to classify the 80 categories of objects in MSCOCO

References publications