This repository contains a PyTorch implementation of Generating Videos with Scene Dynamics by Carl Vondrick, Hamed Pirsiavash, Antonio Torralba, appeared at NIPS 2016. The model learns to generate tiny videos using adversarial networks.
I hope you find this implementation useful.
The golf dataset contains about 600,000 videos, and we use the batch size of 64. Each epoch is around 8500 iterations. The learning rate is decayed by a factor of 10 every 1000 iterations. A training for one epoch took several days on GPU. The following graphs displays the discriminator training loss, generator training loss, and generator validation loss over time. The model seems to converge around 1000-th iteration in the first epoch, where the first learning rate decay happens.
Below are some selected videos during the training of the network. Unnoticed bugs in the implementation could be degrading the results.
The code requires a pytorch installation.
To train the model, refer to main.py.
The data used in the training is from the golf videos from the original paper. Make sure you modify self.data_root in the data_loader.py once you download the dataset. The dataset can be downloaded from the Author's paper website below.