/StackGAN

TensorFlow implementation of "Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks" by Han Zhang, et al.

Primary LanguagePythonMIT LicenseMIT

StackGAN

Text to Photo-Realistic Image Synthesis


Dependencies

tensorflow==2.1.0
numpy==1.16.4
absl_py==0.7.0
matplotlib==2.2.3
pandas==0.23.4
Pillow==6.1.0

Downloads

  • To download all the dependencies, simply execute
pip install -r requirements.txt
  • To download the CUB 200 dataset, simply execute the data_download.py file
python data_download.py
  • Download the Char-RNN-CNN embeddings from this link: download link and unzip it in place.
unzip birds.zip

Training

  • The model.py file contains the bare minimum code to run the stage 1 and stage 2 architecture. It automatically stores the weights after the specified/default number of epochs have completed. Note that the weights will be stored at the same directory level as model.py.
python model.py

Architecture

  • Stage 1
    • Text Encoder Network
      • Text description to a 1024 dimensional text embedding
      • Learning Deep Representations of Fine-Grained Visual Descriptions Arxiv Link
    • Conditioning Augmentation Network
      • Adds randomness to the network
      • Produces more image-text pairs
    • Generator Network
    • Discriminator Network
    • Embedding Compressor Network
    • Outputs a 64x64 image

  • Stage 2
    • Text Encoder Network
    • Conditioning Augmentation Network
    • Generator Network
    • Discriminator Network
    • Embedding Compressor Network
    • Outputs a 256x256 image

Reference Papers

  1. StackGAN: Text to photo-realistic image synthesis [Arxiv Link]
  2. Improved Techniques for Training GANs [Arxiv Link]
  3. Generative Adversarial Text to Image Synthesis [Arxiv Link]
  4. Learning Deep Representations of Fine-Grained Visual Descriptions [Arxiv Link]

Note

This is the code I have submitted to TensorFlow for Google Summer of Code. Hence the attributions and the License is for "TensorFlow Authors" and not "Vishal V". This code is under the MIT License.