/TediGAN

Pytorch implementation for TediGAN: Text-Guided Diverse Image Generation and Manipulation.

MIT LicenseMIT

TediGAN

Paper License: MIT Python

Implementation of TediGAN: Text-Guided Diverse Image Generation and Manipulation in PyTorch.

Official repository for the paper W. Xia, Y. Yang, J.-H. Xue, and B. Wu. "Text-Guided Diverse Image Generation and Manipulation".

Contact: weihaox AT outlook.com

NOTE: The results reported in the paper are about [faces]. We are currently experimenting on other datasets. The codebase includes stylegan training, stylegan inversion, and visual-linguistic learning. The codes will be released when we finish the corresponding training on the new datasets.

TediGAN Framework

We have proposed a novel method (abbreviated as TediGAN) for image synthesis using textual descriptions, which unifies two different tasks (text-guided image generation and manipulation) into the same framework and achieves high accessibility, diversity, controllability, and accurateness for facial image generation and manipulation. Through the proposed multi-modal GAN inversion and large-scale multi-modal dataset, our method can effectively synthesize images with unprecedented quality.

Train

Train the StyleGAN Generator

We use the training scripts from genforce. You should prepare the required dataset to train StyleGAN generator (FFHQ for faces or LSUN Bird for birds).

  • Train on FFHQ dataset: GPUS=8 CONFIG=configs/stylegan_ffhq256.py WORK_DIR=work_dirs/stylegan_ffhq256_train ./scripts/dist_train.sh ${GPUS} ${CONFIG} ${WORK_DIR}

  • Train on LSUN Bird dataset: GPUS=8 CONFIG=configs/stylegan_lsun_bird256.py WORK_DIR=work_dirs/stylegan_lsun_bird256_train ./scripts/dist_train.sh ${GPUS} ${CONFIG} ${WORK_DIR}

Or you can directly use a pretrained StyleGAN generator for ffhq_face_1024, ffhq_face_256, cub_bird_256, or lsun_bird_256.

Invert StyleGAN

This step is to find the matching latent codes of given images in the latent space of a pretrained GAN model, e.g. StyleGAN or StyleGAN2 (should be the same model in the former step). We will include the inverted codes in our Multi-Modal-CelebA-HQ Dataset, which are inverted using idinvert.

Our original method is based on idinvert (including StyleGAN training and GAN inversion). To generate 1024 resolution images and show the scalability of our framework, we also learn the visual-linguistic similarity based on pSp.

Due to the scalability of our framework, there are two general ways to invert a pretrained StyleGAN.

Train the Text Encoder

python train_vls.py

More Results

a smiling young woman with short blonde hair

he is young and wears beard

a young woman with long black hair

Text-to-image Benchmark

Datasets

  • Multi-Modal-CelebA-HQ Dataset [Link]
  • CUB Bird Dataset [Link]
  • COCO Dataset [Link]

Publications

Below is a curated list of related publications with codes.

Text-to-image Generation

  • [DF-GAN] Deep Fusion Generative Adversarial Networks for Text-to-Image Synthesis (2020) [paper] [code]
  • [ControlGAN] Controllable Text-to-Image Generation (NeurIPS 2019) [paper] [code]
  • [DM-GAN] Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis (CVPR 2019) [paper] [code]
  • [MirrorGAN] Learning Text-to-image Generation by Redescription (CVPR 2019) [paper] [code]
  • [Obj-GAN] Object-driven Text-to-Image Synthesis via Adversarial Training (CVPR 2019) [paper] [code]
  • [SD-GAN] Semantics Disentangling for Text-to-Image Generation (CVPR 2019) [paper] [code]
  • [HD-GAN] Photographic Text-to-Image Synthesis with a Hierarchically-nested Adversarial Network (CVPR 2018) [paper] [code]
  • [AttnGAN] Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks (CVPR 2018) [paper] [code]
  • [StackGAN++] Realistic Image Synthesis with Stacked Generative Adversarial Networks (TPAMI 2018) [paper] [code]
  • [StackGAN] Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks (ICCV 2017) [paper] [code]
  • [GAN-INT-CLS] Generative Adversarial Text to Image Synthesis (ICML 2016) [paper] [code]

Text-guided Image Manipulation

  • [ManiGAN] ManiGAN: Text-Guided Image Manipulation (CVPR 2020) [paper] [code]
  • [Lightweight-Manipulation] Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation (NeurIPS 2020) [paper] [code]
  • [SISGAN] Semantic Image Synthesis via Adversarial Learning (ICCV 2017) [paper] [code]
  • [TAGAN] Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language (NeurIPS 2018) [paper] [code]

Metrics

Citation

If you find our work, code or the benchmark helpful for your research, please consider to cite:

@article{xia2020tedigan,
  title={TediGAN: Text-Guided Diverse Face Image Generation and Manipulation},
  author={Xia, Weihao and Yang, Yujiu and Xue, Jing-Hao and Wu, Baoyuan},
  journal={arXiv preprint arXiv: 2012.03308},
  year={2020}
}

Acknowledgments

Code borrows heavily from idinvert and genforce.