caption2image

PyTorch implementation of GAN-INT-CLS and AttnGAN

Dependencies

python 3
Pytorch 1.0.0
tensorflow, tensorboard (you can train/evaluate your model without this if you do not use tensorboard for logger)

In addition, you may need other packages...

Data

Download preprocessed metadata for COCO filename and COCO text and extract
Download COCO dataset
Download embedding file
Place data as below

data_dir 
  |- COCO
       |- filenames 
            |- train2014 
            |- val2014 
       |- text 
            |- train2014 
            |- val2014 
       |- image 
            |- train2014 
            |- val2014

AttnGAN

Training

Train DAMSM models:
- python DAMSM_main.py
  - you can edit config by directly editting the source code
Train AttnGAN models:
- python main.py
  - you can edit config by passing arguments (see AttnGAN/config.py or python main.py --help)

Evaluation

I prepared notebook for evaluation (AttnGAN/eval.ipynb).
You can evaluate generated images by

inception score
frechet inception distance
R-precision

You can also generate images from your own captions.

Pretrained Model

Download DAMSM image_encoder
Download DAMSM text_encoder
Download AttnGAN Generator and config
Place models as below

AttnGAN
  |- results
       |- DAMSM/COCO/2019_05_04_00_32/model
            |- image_encoder600.pth
            |- text_encoder600.pth
       |- AttnGAN/COCO/2019_05_14_17_08
            |- model
                 |- G_epoch50.pth
            |- config.txt

TODO

Paper
- GAN-INT-CLS
  - survey
  - impl
- StackGAN
  - survey
  - impl
- StackGAN++
  - survey
  - impl
- AttnGAN
  - survey
  - impl
- MirrorGAN
  - survey
  - impl
Dataset
- Bird
- MS COCO

Reference

GAN-INT-CLS (code)
AttnGAN (code)

tacchan7412/caption2image