Click to Move

Click to Move: Controlling Video Generation with Sparse Motion

Pytorch implementation of our paper Click to Move: Controlling Video Generation with Sparse Motion In ICCV 2021. Please cite with the following Bibtex code:

@inproceedings{ardino2021click,
  title={Click to Move: Controlling Video Generation with Sparse Motion},
  author={Ardino, Pierfrancesco and De Nadai, Marco and Lepri, Bruno and Ricci, Elisa and Lathuili{\`e}re, St{\'e}phane},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={14749--14758},
  year={2021}
}

Please follow the instructions to run the code.

Scripts

1. Installation

See the c2m.yml configuration file. We provide an user-friendly configuring method via Conda system, and you can create a new Conda environment using the command:

conda env create -f c2m.yml
conda activate c2m

Install cityscapesscripts with pip

cd cityscapesScripts
pip install -e .

WIP

2. Data Preprocessing

2.1 Generate instance segmentation

We apply a modified version of Panoptic-deeplab to get the corresponding semantic and instance maps. You can find it into panoptic_deeplab folder. For this work we have used the HRNet backbone. You can download it from here.

Cityscapes

Please download the Cityscapes dataset from the official website (registration required). After downloading, please put these files under the ~/dataset_cityscape_video/ folder and run the following command in order to generate the correct segmentation maps

cd panoptic_deeplab
python tools/generate_segmentation.py --cfg configs/cityscapes_{trainset/valset}.yaml TEST.MODEL_FILE YOUR_DOWNLOAD_MODEL_FILE

Remember to set up the config file with the correct input folder, output folder and dataset split

You should end up with the following structure:

dataset_cityscape_video
├── leftImg8bit_sequence
│   ├── train
│   │   ├── aachen
│   │   │   ├── aachen_000003_000019_leftImg8bit.png
│   │   │   ├── ...
│   ├── val
│   │   ├── frankfurt
│   │   │   ├── frankfurt_000000_000294_leftImg8bit.png
│   │   │   ├── ...
│   ├── train_semantic_segmask
│   │   ├── aachen
│   │   │   ├── aachen_000003_000019_ssmask.png
│   │   │   ├── ...
│   ├── val_semantic_segmask
│   │   ├── frankfurt
│   │   │   ├── frankfurt_000000_000294_ssmask.png
│   │   │   ├── ...
│   ├── train_instance
│   │   ├── aachen
│   │   │   ├── aachen_000003_000019_gtFine_instanceIds.png
│   │   │   ├── ...
│   ├── val_instance
│   │   ├── frankfurt
│   │   │   ├── frankfurt_000000_000294_gtFine_instanceIds.png
│   │   │   ├── ...

2.2 Generate object trajectories

3 Train the model

We store the configuration of the model as a YAML configuration file. You can have a look at a base configuration in src/config/c2m_journal_cityscapes.yaml. The training file takes as input the following parameters:

config: path to configuration file
device_ids: names of the devices comma separated
seed: seed of the training
profile: debug using PyTorch profiler

Our code support multi-gpu training using DistributedDataParallel. Here's an example of how you can run the code with one or more gpus.

Single gpu

python train.py --device_ids 0 --config config/c2m_journal_cityscapes.yaml

Multi gpu

python -m torch.distributed.run --nnodes=1 --nproc_per_node=2 train.py --device_ids 0,1 --config config/c2m_journal_cityscapes.yaml The example considers a scenario with a single node and two gpus per node. Please change according to your needs. For more information check the DDP example

4 Test the model

python test.py --device_ids 0 --config config/c2m_journal_cityscapes.yaml

PierfrancescoArdino/C2M