/Awesome-Text-to-Image

A Survey on Text-to-Image Generation/Synthesis.

MIT LicenseMIT

Awesome Text_to_Image papers

Awesome

A collection of resources on general text-to-image synthesis task.

Content

1.Description

  • In the last few decades, the fields of Computer Vision (CV) and Natural Language Processing (NLP) have been made several major technological breakthroughs in deep learning research. Recently, researchers appear interested in combining semantic information and visual information in these traditionally independent fields. A number of studies have been conducted on the text-to-image synthesis techniques that transfer input textual description (keywords or sentences) into realistic images.

  • Papers, codes and datasets for the text-to-image task are available here.

2.Quantitative Evaluation Metrics

3.Datasets

  • Caltech-UCSD Bird(CUB)

    Caltech-UCSD Birds-200-2011 (CUB-200-2011) is an extended version of the CUB-200 dataset, with roughly double the number of images per class and new part location annotations.

    • Detailed information (Images): ⇒ [Paper] [Website]
      • Number of different categories: 200 (Training: 150 categories. Testing: 50 categories.)
      • Number of bird images: 11,788
      • Annotations per image: 15 Part Locations, 312 Binary Attributes, 1 Bounding Box, Ground-truth Segmentation
    • Detailed information (Text Descriptions): ⇒ [Paper] [Website]
      • Descriptions per image: 10 Captions
  • Oxford-102 Flower

    Oxford-102 Flower is a 102 category dataset, consisting of 102 flower categories. The flowers are chosen to be flower commonly occurring in the United Kingdom. The images have large scale, pose and light variations.

    • Detailed information (Images): ⇒ [Paper] [Website]
      • Number of different categories: 102 (Training: 82 categories. Testing: 20 categories.)
      • Number of flower images: 8,189
    • Detailed information (Text Descriptions): ⇒ [Paper] [Website]
      • Descriptions per image: 10 Captions
  • MS-COCO

    COCO is a large-scale object detection, segmentation, and captioning dataset.

    • Detailed information (Images & Text Descriptions): ⇒ [Paper] [Website]
      • Number of images: 120k (Training: 80k. Testing: 40k.)
      • Descriptions per image: 5 Captions
  • Multi-Modal-CelebA-HQ

    Multi-Modal-CelebA-HQ is a large-scale face image dataset for text-to-image-generation, text-guided image manipulation, sketch-to-image generation, GANs for face generation and editing, image caption, and VQA.

    • Detailed information (Images & Text Descriptions): ⇒ [Paper] [Website] [Download]
      • Number of images (from Celeba-HQ): 30,000 (Training: 24,000. Testing: 6,000.)
      • Descriptions per image: 10 Captions
    • Detailed information (Masks):
      • Number of masks (from Celeba-Mask-HQ): 30,000 (512 x 512)
    • Detailed information (Sketches):
      • Number of Sketches: 30,000 (512 x 512)
    • Detailed information (Image with transparent background):
      • Not fully uploaded

4.Paper With Code

  • Survey

    • (2021) Adversarial Text-to-Image Synthesis: A Review, Stanislav Frolov et al. [Paper]
    • (2019) A Survey and Taxonomy of Adversarial Neural Networks for Text-to-Image Synthesis, Jorge Agnese et al. [Paper]
  • 2021

    • (ICCV 2021) DAE-GAN: Dynamic Aspect-aware GAN for Text-to-Image Synthesis, Shulan Ruan et al. [Paper][code]
    • (CVPR 2021) Cross-Modal Contrastive Learning for Text-to-Image Generation, Han Zhang et al. [Paper]
    • (TMM 2021) Modality Disentangled Discriminator for Text-to-Image Synthesis, Fangxiang Feng et al. [paper][code]
    • (axXiv 2021) Text to Image Generation with Semantic-Spatial Aware GAN,Kai Hu et al. [paper][code]
    • (IEEE magazine Multimedia 2021) Class-balanced Text to Image Synthesis with Attentive Generative Adversarial Network [paper]
    • (arXiv July 2021) CRD-CGAN: Category-Consistent and Relativistic Constraints for Diverse Text-to-Image Generation [paper]
  • 2020

    • (ECCV 2020) CPGAN: Content-Parsing Generative Adversarial Networks for Text-to-Image Synthesis, Jiadong Liang et al. [Paper] [Code]
    • (CVPR 2020) RiFeGAN: Rich Feature Generation for Text-to-Image Synthesis From Prior Knowledge, Jun Cheng et al. [Paper]
    • (CVPR 2020) CookGAN: Causality based Text-to-Image Synthesis, Bin Zhu et al. [Paper]
    • (TIP 2020) KT-GAN: Knowledge-Transfer Generative Adversarial Network for Text-to-Image Synthesis, Hongchen Tan et al. [paper]
    • (ACM TIST 2020)End-to-End Text-to-Image Synthesis with Spatial Constrains, Min Wang et al. [paper]
    • (TMM 2020)Exploring Global and Local Linguistic Representations for Text-to-image Synthesis, Ruifan Li et al. [paper]
    • (AAAI 2020)Hierarchical Modes Exploring in Generative Adversarial Networks, Mengxiao Hu et al. [paper]
  • 2019

    • (NIPS 2019) Learn, Imagine and Create: Text-to-Image Generation from Prior Knowledge, Tingting Qiao et al. [Paper] [Code]
    • (NIPS 2019) Controllable Text-to-Image Generation, Bowen Li et al. [Paper] [Code]
    • (CVPR 2019) DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis, Minfeng Zhu et al. [Paper] [Code]
    • (CVPR 2019) Object-driven Text-to-Image Synthesis via Adversarial Training, Wenbo Li et al. [Paper] [Code]
    • (CVPR 2019) MirrorGAN: Learning Text-to-image Generation by Redescription, Tingting Qiao et al. [Paper] [Code]
    • (CVPR 2019) Semantics Disentangling for Text-to-Image Generation, Guojun Yin et al. [Paper] [Website]
    • (ICCV 2019) Semantics-Enhanced Adversarial Nets for Text-to-Image Synthesis, Hongchen Tan et al. [Paper]
    • (ICCV 2019) Dual Adversarial Inference for Text-to-Image Synthesis, Qicheng Lao et al. [Paper]
    • (ICCV 2019) Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction, Alaaeldin El-Nouby et al. [Paper] [Code]
    • (TCSVT 2019)Bridge-GAN: Interpretable Representation Learning for Text-to-image Synthesis Mingkuan Yuan et al. [paper]
    • (ICLR 2019)GENERATING MULTIPLE OBJECTS AT SPATIALLY DISTINCT LOCATIONS Tobias Hinz et al. [paper][code]
    • (AAAI 2019) Perceptual Pyramid Adversarial Networks for Text-to-Image Synthesis Lianli Gao et al. [paper]
    • (WACV 2019)C4Synth: Cross-Caption Cycle-Consistent Text-to-Image Synthesis K J Joseph et al. [paper]
  • 2018

    • (CVPR 2018) AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks, Tao Xu et al. [Paper] [Code]
    • (CVPR 2018) Photographic Text-to-Image Synthesis with a Hierarchically-nested Adversarial Network, Zizhao Zhang et al. [Paper] [Code]
    • (NIPS 2018) Text-adaptive generative adversarial networks: Manipulating images with natural language, Seonghyeon Nam et al. [Paper] [Code]
  • 2017

    • (ICCV 2017) StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks, Han Zhang et al. [Paper] [Code]
    • (arXiv 2017)TAC-GAN – Text Conditioned Auxiliary Classifier Generative Adversarial Network, Ayushman Dash et al. [paper]
    • (ICIP 2017) I2T2I: LEARNING TEXT TO IMAGE SYNTHESIS WITH TEXTUAL DATA AUGMENTATION, Hao Dong et al. [paper]
  • 2016

    • (ICML 2016) Generative Adversarial Text to Image Synthesis, Scott Reed et al. [Paper] [Code]
    • (NIPS 2016) Learning What and Where to Draw, Scott Reed et al. [Paper] [Code]