This repository, forked from StackGAN-pytorch, is an implementation of the author's "StackGAN" text-to-image synthesis method, that is compatible with python >= 3.6 and newer versions of torch and CUDA, as the original could be run only with python 2.7 and older libraries.
Also, this code can be run both on Ubuntu 18.04 and on Windows 10 x64 (see the Dependencies section below).
Pytorch implementation for reproducing COCO results in the paper StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks by Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas. The network structure is slightly different from the tensorflow implementation.
Ubuntu
- Ubuntu version: >= 16.04, <= 18.04 (tested with Ubuntu 18.04). Note that CUDA 8.0 to 10.2 is supported on Ubuntu 16.04, while only CUDA 10.0 to 10.2 is supported on Ubuntu 18.04
- CUDA: 10.1. Use the following command to check wich version is installed on your system:
nvcc --version
- GPU: a CUDA 10.1 compatible card (tested on a TITAN Xp).
- Python: >= 3.6 (tested with python 3.6.9)
- Torch: 1.7.0+cu101. Use the following command:
pip install torch==1.7.0 torchvision==0.8.1 -f https://download.pytorch.org/whl/cu101/torch_stable.html
. If you want to install different versions go to Pytorch download page.
To check that torch works correctly with CUDA run the "test_gpu_torch.py" script and see the generated output (it should output "True", a string describing the CUDA device found and the name of your GPU card). - Install packages from "requirements_ubuntu_18-04.txt". Use the command:
pip install <path_to>/requirements_ubuntu_18-04.txt
.
Windows
- Windows version: 10, 64 bit (x64).
- CUDA: 10.0. Use the following command to check wich version is installed on your system:
nvcc --version
- GPU: a CUDA 10.0 compatible card (tested on a NVIDIA GeForce GTX 1060).
- Python: >= 3.6 (tested with python 3.7.8)
- Torch: 1.8.1+cu102. Use the following command:
pip install torch==1.8.1 torchvision==0.9.1 -f https://download.pytorch.org/whl/cu102/torch_stable.html
. If you want to install different versions go to Pytorch download page.
To check that torch works correctly with CUDA run the "test_gpu_torch.py" script and see the generated output. - Install packages from "requirements_win_10_x64.txt". Use the command:
pip install <path_to>/requirements_win_10_x64.txt
Recommended: in order not to mix different packages and/or python versions on your system, it is convenient to use a virtual environment, which is a self-contained environment with all the dependencies needed by your application. with virtualenv:
- Create a new virtual environment:
virtualenv -p <path_to_python_executable> <virtual_env_name>
- Activate the created environment.
- on Windows:
<virtual_env_name>/Scripts/activate
- on Linux:
source <virtual_env_name>/Scripts/activate
- on Windows:
- Once the environment is activated, install the required dependencies:
pip install <path_to>/requirements_<win/linux>.txt
(see the two previous sections). - With the
pip list
command, you will see all the packages available within the environment. - Execute your application
- Deactivate:
deactivate
- Download our preprocessed char-CNN-RNN text embeddings for training coco and evaluating coco, save them to
data/coco
.
- [Optional] Follow the instructions reedscot/icml2016 to download the pretrained char-CNN-RNN text encoders and extract text embeddings.
- Download the coco image data. Extract all the images into
data/coco/images
folder. - The data folder structure should look like:
data
|-- coco
| |-- images
| | COCO_train2014_000000581921.jpg
| | COCO_train2014_000000581909.jpg
| | ...
| |-- test
| | filename.txt
| | filename.pickle
| | val_filename.txt
| | val_captions.txt
| | val_captions.t7
| |-- train
| | char-CNN-RNN-embeddings.pickle
| | filenames.pickle
- The steps to train a StackGAN model on the COCO dataset using our preprocessed embeddings.
- Step 1: train Stage-I GAN (e.g., for 120 epochs).
From the./code
folder:python main.py --cfg cfg/coco_s1.yml --gpu <GPU_ID>
(if you only have one GPU card, GPU_ID = 0). - Step 2:
- set the path to the last saved model from Stage-I GAN in the
coco_s2.yml
file, e.g:
STAGE1_G: '../output/coco_stageI/Model/netG_epoch_120.pth'
; - train Stage-II GAN (e.g., for another 120 epochs). From the
./code
folder:
python main.py --cfg cfg/coco_s2.yml --gpu <GPU_ID>
.
- set the path to the last saved model from Stage-I GAN in the
- Step 1: train Stage-I GAN (e.g., for 120 epochs).
*.yml
files are example configuration files for training/evaluating our models.- If you run in GPU memory occupation errors, try reducing the batch size in the
*.yml
files, e.g. from 128 to 64. - If you want to try your own datasets, here are some good tips about how to train GAN. Also, we encourage to try different hyper-parameters and architectures, especially for more complex datasets.
- StackGAN for coco. Download and save it to
models/coco
. - Our current implementation has a higher inception score(10.62±0.19) than reported in the StackGAN paper
From ./code
folder:
- Run evaluation:
- set the path to the last saved generator model from Stage-II GAN in the
coco_eval.yml
file, e.g:
NET_G: '../models/coco/netG_epoch_90.pth'
; - run
python main.py --cfg cfg/coco_eval.yml --gpu <GPU_ID>
to generate samples from captions in COCO validation set.
The output images will be saved in a folder such as:../models/coco/netG_epoch_<N>
(where N is the number of epoch of last saved model).
- set the path to the last saved generator model from Stage-II GAN in the
- To save an image along with the caption from which it was generated: run
python img_caption_viewer.py --img_path ../models/coco/netG_epoch_<N>
.
If you find StackGAN useful in your research, please consider citing:
@inproceedings{han2017stackgan,
Author = {Han Zhang and Tao Xu and Hongsheng Li and Shaoting Zhang and Xiaogang Wang and Xiaolei Huang and Dimitris Metaxas},
Title = {StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks},
Year = {2017},
booktitle = {{ICCV}},
}
follow-up work from the authors
- StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks
- AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks [supplementary][code]
References