DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis (CVPR 2022 Oral)

Official Pytorch implementation for our paper DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis by Ming Tao, Hao Tang, Fei Wu, Xiao-Yuan Jing, Bing-Kun Bao, Changsheng Xu.

Requirements

python 3.8
Pytorch 1.9
At least 1x12GB NVIDIA GPU

Installation

Clone this repo.

git clone https://github.com/tobran/DF-GAN
pip install -r requirements.txt
cd DF-GAN/code/

Preparation

Datasets

Download the preprocessed metadata for birds coco and extract them to data/
Download the birds image data. Extract them to data/birds/
Download coco2014 dataset and extract the images to data/coco/images/

Training

cd DF-GAN/code/

Train the DF-GAN model

For bird dataset: bash scripts/train.sh ./cfg/bird.yml
For coco dataset: bash scripts/train.sh ./cfg/coco.yml

Resume training process

If your training process is interrupted unexpectedly, set resume_epoch and resume_model_path in train.sh to resume training.

TensorBoard

Our code supports automate FID evaluation during training, the results are stored in TensorBoard files under ./logs. You can change the test interval by changing test_interval in the YAML file.

For bird dataset: tensorboard --logdir=./code/logs/bird/train --port 8166
For coco dataset: tensorboard --logdir=./code/logs/coco/train --port 8177

Evaluation

Download Pretrained Model

DF-GAN for bird. Download and save it to ./code/saved_models/bird/
DF-GAN for coco. Download and save it to ./code/saved_models/coco/

Evaluate DF-GAN models

We synthesize about 3w images from the test descriptions and evaluate the FID between synthesized images and test images of each dataset.

cd DF-GAN/code/

For bird dataset: bash scripts/calc_FID.sh ./cfg/bird.yml
For coco dataset: bash scripts/calc_FID.sh ./cfg/coco.yml
We compute inception score for models trained on birds using StackGAN-inception-model.

Some tips

Our evaluation codes do not save the synthesized images (about 3w images). If you want to save them, set save_image: True in the YAML file.
Since we find that the IS can be overfitted heavily through Inception-V3 jointed training, we do not recommend the IS metric for text-to-image synthesis.

Performance

The released model achieves better performance than the CVPR paper version.

Model	CUB-FID↓	COCO-FID↓	NOP↓
DF-GAN(paper)	14.81	19.32	19M
DF-GAN(pretrained model)	12.10	15.41	18M

Sampling

cd DF-GAN/code/

Synthesize images from example captions

For bird dataset: bash scripts/sample.sh ./cfg/bird.yml
For coco dataset: bash scripts/sample.sh ./cfg/coco.yml

Synthesize images from your text descriptions

Replace your text descriptions into the ./code/example_captions/dataset_name.txt
For bird dataset: bash scripts/sample.sh ./cfg/bird.yml
For coco dataset: bash scripts/sample.sh ./cfg/coco.yml

The synthesized images are saved at ./code/samples.

Citing DF-GAN

If you find DF-GAN useful in your research, please consider citing our paper:

@inproceedings{tao2022df,
  title={DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis},
  author={Tao, Ming and Tang, Hao and Wu, Fei and Jing, Xiao-Yuan and Bao, Bing-Kun and Xu, Changsheng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={16515--16525},
  year={2022}
}

The code is released for academic research use only. For commercial use, please contact Ming Tao.

Reference

Hsintien-Ng/DF-GAN