/GIMM-VFI

[NeurIPS 2024] Generalizable Implicit Motion Modeling for Video Frame Interpolation

Primary LanguagePythonOtherNOASSERTION

Generalizable Implicit Motion Modeling

for Video Frame Interpolation

Zujin Guo  Wei Li  Chen Change Loy
S-Lab, Nanyang Technological University 
NeurIPS 2024

GIMM-VFI performs generalizable continuous motion modeling and interpolations between two adjacent video frames at arbitrary timesteps.

πŸ“– For more visual results of GIMM-VFI, go checkout our project page.


News

  • 2024.11.18: Train code is release! We have also resolved an issue with DS_SCALE, which should be a float between 0 and 1 for high-resolution interpolation, such as for 2K and 4K frames.
  • 2024.11.08: The ComfyUI version of GIMM-VFI is now available in the ComfyUI-GIMM-VFI repository, thanks to the dedicated efforts of @kijai :)
  • 2024.11.06: Test codes and model checkpoints are publicly available now. A perceptually enhanced version of GIMM-VFI is also released along with this update.

Install

  • Pytorch 1.13.0
  • CUDA 11.6
  • CuPy
# git clone this repository
git clone https://github.com/GSeanCDAT/GIMM-VFI
cd GIMM-VFI

# create new conda env
conda create -n gimmvfi python=3.7 -y
conda activate gimmvfi

# install other python dependencies
pip install -r requirements.txt

GIMM-VFI Models

GIMM-VFI can be implemented with different flow estimators. As described in our paper, we provide RAFT-based GIMM-VFI-R and FlowFormer-based GIMM-VFI-F in this repo.

Additionally, we release a perceptually enhanced version of GIMM-VFI that incorporates an additional learning objective, the LPIPS loss, during training. Denoted as GIMM-VFI-R-P and GIMM-VFI-F-P, these enhanced variants achieve substantial improvements in perceptual interpolation.

enhanced_results

All the model checkpoints can be found from this link. Please put them into ./pretrained_ckpt folder after downloading.

Demo

Interpolation demos can be create through the following command:

sh scripts/video_Nx.sh YOUR_PATH_TO_FRAME YOUR_OUTPUT_PATH DS_SCALE N_INTERP

The model variant by default is GIMM-VFI-R-P. You can change the model variant in scripts/video_Nx.sh.

Here is an example usage for 9X interpolation:

sh scripts/video_Nx.sh demo/input_frames demo/output 1 9

The expected interpolation output: demo_output

DS_SCALE is the downsampling scale factor ranging from 0 to 1. can be adjusted for high-resolution interpolations. For instance, you can perform 8X interpolation for 2K frames using following command:

sh scripts/video_Nx.sh demo/2K_input_frames demo/2K_output 0.5 8

The expected interpolation output can be found here.

In our practice, we tested GIMM-VFI on 2K and 4K frames for 8X interpolations on Nvidia V100 GPUs. Following is our settings and the corresponding memory usages:

[2K interpolation] DS_SCALE: 0.5 Memory-Usage: 7932MiB
[4K interpolation] DS_SCALE: 0.25 Memory-Usage: 10922MiB

Dataset Preparation

  • Download the Vimeo90K, SNU-FILM and X4K1000FPS datasets.

  • Obtain the motion modeling benchmark datasets, Vimeo-Triplet-Flow (VTF) and Vimeo-Septuplet-Flow (VSF), by extracting optical flows from the Vimeo90K triplet and septuplet test sets using FlowFormer.

The file structure should be like:

β”œβ”€β”€ data
    β”œβ”€β”€ SNU-FILM
        β”œβ”€β”€ test
        β”œβ”€β”€ test-easy.txt
        β”œβ”€β”€ test-medium.txt
        β”œβ”€β”€ test-hard.txt
        β”œβ”€β”€ test-extreme.txt
    β”œβ”€β”€ x4k
        β”œβ”€β”€ test
            β”œβ”€β”€ Type1
            β”œβ”€β”€ Type2
            β”œβ”€β”€ Type3
    β”œβ”€β”€ vimeo90k
        β”œβ”€β”€ vimeo_septuplet
            β”œβ”€β”€ sequences
            β”œβ”€β”€ flow_sequences
        β”œβ”€β”€ vimeo_triplet
            β”œβ”€β”€ sequences
            β”œβ”€β”€ flow_sequences

Evaluation

Motion Modeling:

On the VTF benchmark:

sh scripts/bm_VTF.sh

On the VSF benchmark:

sh scripts/bm_VSF.sh

Interpolation:

On the SNU-FILM-arb benchmark:

sh scripts/bm_SNU_FILM_arb.sh

On the X4K benchmark:

sh scripts/bm_X4K.sh

The model variants can be changed inside the shell scripts.

Train

Following is the general command for training:

sh scripts/train.sh YOUR_CONFIG OUTPUT_DIR PRETRAINED_CKPT NUM_GPU

Specifically, you can train GIMM and GIMM-VFI by following the instructions below.

GIMM

sh scripts/train.sh configs/gimm/gimm.yaml ./work_dirs '' 2

GIMM-VFI-R

sh scripts/train.sh configs/gimmvfi/gimmvfi_r_arb.yaml pretrained_ckpt/gimm.pt 8

GIMM-VFI-F

sh scripts/train.sh configs/gimmvfi/gimmvfi_f_arb.yaml pretrained_ckpt/gimm.pt 8

Citation

If you find our work interesting or helpful, please leave a star or cite our paper.

@inproceedings{guo2024generalizable,
    title={Generalizable Implicit Motion Modeling for Video Frame Interpolation},
    author={Guo, Zujin and Li, Wei and Loy, Chen Change},
    booktitle={Advances in Neural Information Processing Systems},
    year={2024}
}

Acknowledgement

The code is based on GINR-IPC and draws inspiration from several other outstanding works including RAFT, FlowFormer, AMT, softmax-splatting, EMA-VFI, MoTIF and LDMVFI.