This project is a PyTorch implementation of Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. Given images representing two different domains of data (for example, Monet paintings and real world photographs or pictures of horses and zebras), we learn a pair of autoencoders that maps from domain A to domain B and vice versa. Notably compared to prior methods, this paper does not need paired training data. This simple and novel intuition of the paper is by chaining a function that maps from domain A to domain B and another function that maps from domain B to domain A, we should get an output that is similar to the original input.
- Python 3.6
- PyTorch 0.4.0
- NumPy
- PIL
- Tensorboard and TensorFlow for logging
The code assumes a dataset folder hierarchy below, as used by the original authors here:
datasetAtoB
|____trainA
|____testA
|____trainB
|____testB
To train, run the following command
$ python train.py --dataroot path_to_dataset_AtoB
These are some additional parameters that you can use:
--gpu
: (int) id of the GPU you want to use (if not specified, will train on CPU)--use-identity
: (int) 0 or 1; whether the identity loss should be added to the loss function for training the generators (i.e., when given data from the domain the generator should be mapping to, the generator should act as an identity function)
Data need to run TensorBoard will be written out during training at regular intervals. You can start a Tensorboard server from the root of the project directory with:
$ tensorboard --logdir='./logs' --port 6006
I also write out a few images from the test set with the generators applied as another debugging tool. These images appear in the visualization folder.
Evaluation was set up to apply the generators on all the test images within the dataset hierarchy previously defined. Going through the dataset folders, the predicted output will be written in folders testA_before
, testA_inStyleOfB
, testB_before
, and testB_inStyleOfA
. Of course, the code can be trivially modified to run on arbitrary directories of data.
$ python infer.py --dataroot path_to_dataset_AtoB --modelAtoB model_path_AtoB --modelBtoA model_path_BtoA
You can also specify if you would like to run on a GPU:
--gpu
: (int) id of the GPU you want to use (if not specified, will train on CPU)
Training a GAN byitself is already a fickle process, so concurrently training two GANs was not the simplest process. Notably, I'd reach some training scenarios where one determinator would get into a state where it stopped learning (i.e., started guessing 50% likelihood for everything) which broke the information flow as little useful signal was provided to the generator.
Here are some results from horse2zebra. Original horse image on the left and syntheticly stylized as a zebra on the right.
Unfortunately, learning the reverse function was a bit more difficult. Original zebra image on the left and synthetically stlized as a horse on the right.
On a GTX1070, inference can be done in approximately 0.0113 seconds or approximately at a frequency of 88 Hz for a 256 x 256 x 3 image (ignoring time spent loading the image and moving it onto GPU memory). It would be interesting to try applying this in high performance, real-time environments.