tensorflow==2.1.0
numpy==1.16.4
absl_py==0.7.0
matplotlib==2.2.3
pandas==0.23.4
Pillow==6.1.0
- To download all the dependencies, simply execute
pip install -r requirements.txt
- To download the CUB 200 dataset, simply execute the
data_download.py
file
python data_download.py
- Download the Char-RNN-CNN embeddings from this link: download link and unzip it in place.
unzip birds.zip
- The
model.py
file contains the bare minimum code to run the stage 1 and stage 2 architecture. It automatically stores the weights after the specified/default number of epochs have completed. Note that the weights will be stored at the same directory level asmodel.py
.
python model.py
- Stage 1
- Text Encoder Network
- Text description to a 1024 dimensional text embedding
- Learning Deep Representations of Fine-Grained Visual Descriptions Arxiv Link
- Conditioning Augmentation Network
- Adds randomness to the network
- Produces more image-text pairs
- Generator Network
- Discriminator Network
- Embedding Compressor Network
- Outputs a 64x64 image
- Text Encoder Network
- Stage 2
- Text Encoder Network
- Conditioning Augmentation Network
- Generator Network
- Discriminator Network
- Embedding Compressor Network
- Outputs a 256x256 image
- StackGAN: Text to photo-realistic image synthesis [Arxiv Link]
- Improved Techniques for Training GANs [Arxiv Link]
- Generative Adversarial Text to Image Synthesis [Arxiv Link]
- Learning Deep Representations of Fine-Grained Visual Descriptions [Arxiv Link]
This is the code I have submitted to TensorFlow for Google Summer of Code. Hence the attributions and the License is for "TensorFlow Authors" and not "Vishal V". This code is under the MIT License.