Caption to Image generation using Deep Residual Generative Adversarial Networks [DR-GAN] 🧙🏻‍♂️

The proposed GAN model is built by conditioning the generated images on a text description instead of on a class label. We implemented a Deep Residual GAN network to create fine pictures from very latent noise. The coarse images are aligned to attributes and are embedded as the generator inputs and classifier labels. A straight route, similar to the Resnet, is covered in a generative network to directly transport coarse pictures to higher layers. In addition, adversarial training is used in a cyclic fashion to prevent picture degradation. Experimental results of applying the Deep Residual GAN model to datasets BIRD CUB-200 and FLICKR 8K show its higher accuracy to the state-of-art GANs.

Steps to Run 🧾

Please refer to the READMEs in the folder Dataset, text_pkl, image_pkl, Weights and word2vec_pretrained_model to obtain the necessary data.
Images pickle file can be found in Dataset folder that was created using process_images.ipynb to resize and normalize the images and generate numpy arrays
Captions pickle file can be found in Captions folder that was created using process_captions.ipynb to generate sentence embeddings for the captions or use one provided in Captions folder
Trained model weights files can be found in the weights folder
Import the jupyter notebook CSE_676_Text2Image_final.ipynb in Google Colab and load the data.
Run code snippets in Google Colab.

Results 🚀

Below are some outputs of the DR-GAN model when the input text was given as “a yellow bird with black tail” & “a green bird with black head”.

Other results can be seen below

Epochs	Bird Generated Images	Flickr8K Generated Images
0 - 200	Click here to view	Click here to view
201 - 400	Click here to view	Click here to view
401 - 600	Click here to view	Click here to view
601 - 800	Click here to view	Click here to view
801 - 1000	Click here to view	Click here to view

References 📚

[1] Meng Wang, Huafeng Li, Fang Li: Generative Adversarial Network based on Resnet for Conditional Image Restoration, 2017. arXiv:1707.04881.
[2] Tingting Qiao, Jing Zhang, Duanqing Xu, and DachengTao. Mirrorgan: Learning text-to-image generation by redescription, 2019. arXiv:1903.05854.
[3] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
[4] Han Zhang, Tao Xu, Hongsheng Li, ShaotingZhang, Xiaogang Wang, Xiaolei Huang, and Dimitris Metaxas. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks, 2016. arXiv:1612.03242.

sajmaru/Text_to_image_DRGAN

Caption to Image generation using Deep Residual Generative Adversarial Networks [DR-GAN] 🧙🏻‍♂️

Steps to Run 🧾

Results 🚀

References 📚