[ECCV 2022] Controllable Video Generation through Global and Local Motion Dynamics

Official repository of the paper

ECCV 2022

Project Website • Arxiv

Abstract: We present GLASS, a method for Global and Local Action-driven Sequence Synthesis. GLASS is a generative model that is trained on video sequences in an unsupervised manner and that can animate an input image at test time. The method learns to segment frames into foreground-background layers and to generate transitions of the foregrounds over time through a global and local action representation. Global actions are explicitly related to 2D shifts, while local actions are instead related to (both geometric and photometric) local deformations. GLASS uses a recurrent neural network to transition between frames and is trained through a reconstruction loss. We also introduce W-Sprites (Walking Sprites), a novel synthetic dataset with a predefined action space. We evaluate our method on both W-Sprites and real datasets, and find that GLASS is able to generate realistic video sequences from a single input image and to successfully learn a more advanced action space than in prior work.

Citation

Davtyan, A., Favaro, P. (2022). Controllable Video Generation Through Global and Local Motion Dynamics. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13677. Springer, Cham. https://doi.org/10.1007/978-3-031-19790-1_5

@InProceedings{10.1007/978-3-031-19790-1_5,
	author="Davtyan, Aram and Favaro, Paolo",
	editor="Avidan, Shai and Brostow, Gabriel and Ciss{\'e}, Moustapha and Farinella, Giovanni Maria and Hassner, Tal",
	title="Controllable Video Generation Through Global and Local Motion Dynamics",
	booktitle="Computer Vision -- ECCV 2022",
	year="2022",
	publisher="Springer Nature Switzerland",
	address="Cham",
	pages="68--84",
	isbn="978-3-031-19790-1"
}

Prerequisites

For convenience, we provide an env.yml file that can be used to install the required packages to a conda environment with the following command

conda env create -f evn.yml

Datasets

Tennis and BAIR

Collect the datasets following instruction from the official PVG repository. Put them into the ./data folder, if you want to use the default configs, or specify the path to the dataset in the corresponding config otherwise. For instance, by default the code expects to find the Tennis dataset in the ./data/tennis folder that in its turn contains 3 subfolders for train, val and test splits of the dataset.

W-Sprites

To generate W-Sprites dataset, first activate the environment specified in the data/wsprites/env.yml file and then run the following commands:

1. cd data/wsprites/sprites
2. python random_character.py
3. python frame_to_npy.py
4. cd ../../../
5. mkdir data/wsprites/data
6. python -m data.wsprites.generate

This will create training sequences in the data/wsprites/data folder.

Custom datasets

Use the convert_video_directory.py script from the official PVG repository to convert a custom video dataset to the appropriate format.

Training

To reproduce results from the paper, use the train.py script from the repository. Usage example:

python train.py --config configs/wsprites.yaml --run-name test-run --wandb

The output of python train.py --help is as follows

> python train.py --help
usage: train.py [-h] --run-name RUN_NAME --config CONFIG [--resume-step RESUME_STEP] [--num-gpus NUM_GPUS] [--random-seed RANDOM_SEED] [--wandb]

optional arguments:
  -h, --help            show this help message and exit
  --run-name RUN_NAME   Name of the current run.
  --config CONFIG       Path to the config file.
  --resume-step RESUME_STEP
                        Step to resume the training from.
  --num-gpus NUM_GPUS   Number of gpus to use for training.
                        By default uses all available gpus.
  --random-seed RANDOM_SEED
                        Random seed.
  --wandb               If defined, use wandb for logging.

Please, check example configs in the configs folder.

Evaluation

The evaluation consists of two steps.

To calculate action metrics, use evaluate.py script. Usage example:

python evaluate.py --config configs/wsprites.yaml --run-name test-run --step 470000 --wandb

To calculate image/video quality metrics one first needs to generate evaluation dataset with the help of build_evaluation_dataset.py script. Usage example:
```
python build_evaluation_dataset.py --config configs/wsprites.yaml --run-name test-run --step 470000
```
After that the metrics can be calculated using the evaluate_dataset.py script. Usage example:
```
python evaluate_dataset.py --config configs/wsprites.yaml
```

Araachie/glass