This repository provides the code for our work on unsupervised video retargeting.
@inproceedings{Recycle-GAN,
author = {Aayush Bansal and
Shugao Ma and
Deva Ramanan and
Yaser Sheikh},
title = {Recycle-GAN: Unsupervised Video Retargeting},
booktitle = {ECCV},
year = {2018},
}
Acknowledgements: This code borrows heavily from the PyTorch implementation of Cycle-GAN and Pix2Pix. A huge thanks to them!
Click above to see video!
Click above to see video!
We use this formulation in our ECCV'18 paper on unsupervised video retargeting for various domains where space and time information matters such as face retargeting. Without any manual annotation, our approach could learn retargeting from one domain to another.
The repository contains the code for training a network for retargeting from one domain to another, and use a trained module for this task. Following are the things to consider with this code:
- Linux or MacOS
- Python 3
- Pytorch 0.4
- NVIDIA GPU + CUDA CuDNN
- numpy 1.15.0
- torch 0.4.1.post2
- torchvision 0.2.2
- visdom
- dominate
Run the following command to install automatically:
pip install requirements.txt
For each task, create a new folder in "dataset/" directory. The images from two domains are placed respectively in "trainA/" and "trainB/". Each image file consists of horizontally concatenated images, "{t, t+1, t+2}" frames from the video. The test images are placed in "testA/" and "testB/". Since we do not use temporal information at test time, the test data consists of single image "{t}".
There are two training modules in "scripts/" directory: (1). Recycle-GAN, (2). ReCycle-GAN
Recycle-GAN is the model described in the paper and is used for most examples in the paper, specifically face to face, flower to flower, clouds and wind synthesis, sunrise and sunset.
ReCycle-GAN is mostly similar to Recycle-GAN. Additionally, we also use vanilla cycle-losses from CycleGAN between corresponding source and target frames. We found this module useful for tasks such as unpaired image to labels, and labels to image on VIPER dataset, image to normals, and normals to image on NYU-v2 depth dataset.
There are two prediction model used in this work: (1). simple U-Net, (2). higher-capacity prediction.
If you want to use this prediction module, please set the flag "--which_model_netP" to "unet_128" and "unet_256" respectively.
An advanced version of prediction module is a higher capacity module by setting the flag "--which_model_netP" to "prediction".
We observed that model converges in 20-40 epochs when sufficiently large data is used. For smaller datasets (ones having 1000 images or less), it is suitable to let it train for longer.
At test time, we do inference per image (as mentioned previously). The test code is based on cycle-gan.
Please use following links to download Face, Flowers, and Viper data:
Please contact Aayush Bansal for any specific data or trained models, or for any other information.