Doodle-to-Image-Generator
This is an automatic realistic image generator from doodles using GauGan which has been deployed on streamlit. You can check it out here. This uses GauGAN to generate the images given the semantic maps or doodles here. The model is based on Conditional GAN where given a particular image and a condition the realistic image gets generated.
The model has been taken from Nvidia labs SPADE released in 2019.
Table of contents
- Idea
- Sample Results
- Dataset
- Setup
- Different Components
- Fine Tuning the model
- Model and Loss
- Other Examples
- References
A lot of interest was captured when GauGAN2 was released by Nvidia recently. I wanted to check it out but turns out GauGAN2 has not yet been open-sourced to the public. So, I started looking into GauGAN in general and found this implementation. I wanted to fully understand the functioning of the model and understand how even though it was adopted from pix2pix model, it still had way better results. Another factor was I wanted to make the front end easily accessible by the data science community since not everyone is well versed with html and css. Hence, I deployed it on streamlit. Idea
Here is the working of GauGAN in real life deployed on a website. Sample Results
Originally in the SPADE paper, the model was trained on 3 different datasets namely COCO, cityscapes and ADE20K. Although, Flcikr dataset was also used however I am not so sure about the segmentation of that dataset. The model has been trained on 8 V100 GPUs that equals 128 GB of memory. So, to avoid any of such memory problems I used a pretrained dataset. However, I tried training on custom dataset as well. You can find the details to that Datasethere.
Setup
You can easily setup this application. Here are the steps to replicate my outcome in your system.
Clone the repository. git clone https://github.com/Shreyz-max/Doodle-to-Image-Generator.git
Create a conda environment. conda create -n doodle_image python=3.10
Activate environment. conda activate doodle_image
Install requirements file. pip install -r requirements.txt
Run app.py streamlit run streamlit/app.py
Different Components
app.py
has all of the streamlit code to run the frontend.
label_colors.py
contains a list of dictionaries for each label as well as it's corresponding color that I have assigned
and its corresponding id in the coco dataset.
Fine Tuning the model
Here are a few things that I did.
So basically, GauGAN is trained to take a black and white semantic map and convert it into a realisitc image.
So, once we have a painted image, it is converted into black and white using its labels. I have selected a few labels from COCO
dataset. You have 182 labels. So, you can choose any of the labels. Just select a few labels from your choice from COCO dataset.
Change the color based on what you like in label_colors.py
. Make sure that the ids of those labels match those of the COCO dataset.
Also make the changes in the select-box of app.py
.
In case you want to use a different model with different datasets. Download the model from here. Use latest_net_G.pth
for this.
To understand the model and the different types of losses, I would suggest reading the paper here. To train on your dataset, you can follow my repository here. This follows you through how to train in google colab. You can then download the model and load it in this project. Make a few changes as mentioned above, and you will have a working frontend as well. Model and Loss
Some other results to enjoy: Other Examples
Performance of both algorithms on testing data
Doodle Input | Realistic Image |
---|---|