/DTVNet

DTVNet: Dynamic Time-lapse Video Generation via Single Still Image, ECCV'20 Spotlight.

Primary LanguagePython

DTVNet — Official PyTorch Implementation

Python 3.7 PyTorch 1.5.1 License MIT

This repository contains the official pytorch implementation of the below papers:

Updates

12/20/2021

News: We release a high-quality and high-resolution Quick-Sky-Time (QST) dataset in the extended version, which can be viewed as a new benchmark for high-quality scenic image and video generation tasks.

Example.mp4

Demo

Demo

Using the Code

Requirements

This code has been developed under Python3.7, PyTorch 1.5.1 and CUDA 10.1 on Ubuntu 16.04.

# Install python3 packages
pip3 install -r requirements.txt

Datasets in the paper

Unsupervised Flow Estimation

  1. Our another work ARFlow (CVPR'20) is used as the unsupervised optical flow estimator in the paper. You can refer to flow/ARFlow/README.md for more details.

  2. Training:

    > Modify `configs/sky.json` if you use another data_root or settings.
    cd flow/ARFlow
    python3 train.py
  3. Testing:

    > Pre-traind model is located in `checkpoints/Sky/sky_ckpt.pth.tar`
    python3 inference.py --show  # Test and show a single pair images.
    python3 inference.py --root ../../data/sky_timelapse/ --save_path ../../data/sky_timelapse/flow/  # Generate optical flow in advance for Sky Time-lapse dataset.

Running

  1. Train DTVNet model.

    > Modify `configs/sky_timelapse.json` if you use another data_root or settings.
    python3 train.py
  2. Test DTVNet model.

    > Pre-traind model is located in `checkpoints/DTV_Sky/200708162546`
    > Results are save in `checkpoints/DTV_Sky/200708162546/results`
    python3 Test.py

Quick-Sky-Time (QST) Dataset

QST contains 1,167 video clips that are cut out from 216 time-lapse 4K videos collected from YouTube, which can be used for a variety of tasks, such as (high-resolution) video generation, (high-resolution) video prediction, (high-resolution) image generation, texture generation, image inpainting, image/video super-resolution, image/video colorization, image/video animating, etc. Each short clip contains multiple frames (from a minimum of 58 frames to a maximum of 1,200 frames, a total of 285,446 frames), and the resolution of each frame is more than 1,024 x 1,024. Specifically, QST consists of a training set (containing 1000 clips, totally 244,930 frames), a validation set (containing 100 clips, totally 23,200 frames), and a testing set (containing 67 clips, totally 17,316 frames). Click here (Key: qst1) to download the QST dataset.

# About QST:
├── Quick-Sky-Time
    ├── clips  # contains 1,167 raw video clips
        ├── 00MOhFGvOJs  # [video ID of the raw YouTube video]
            ├── 00MOhFGvOJs 00_00_14-00_00_25.mp4  # [ID] [start time]-[end time] 
            ├── ...
        ├── ...
    ├── train_urls.txt  # index names of the train set
    ├── test_urls.txt  # index names of the test set
    └── val_urls.txt  # index names of the validation set

Citation

If our work is useful for your research, please consider citing:

@inproceedings{dtvnet,
  title={DTVNet: Dynamic time-lapse video generation via single still image},
  author={Zhang, Jiangning and Xu, Chao and Liu, Liang and Wang, Mengmeng and Wu, Xia and Liu, Yong and Jiang, Yunliang},
  booktitle={European Conference on Computer Vision},
  pages={300--315},
  year={2020},
  organization={Springer}
}
@article{dtvnet+,
  title={DTVNet+: A High-Resolution Scenic Dataset for Dynamic Time-lapse Video Generation},
  author={Zhang, Jiangning and Xu, Chao and Liu, Yong and Jiang, Yunliang},
  journal={arXiv preprint arXiv:2008.04776},
  year={2020}
}