/IART

(IART)An Implicit Alignment for Video Super-Resolution.

Primary LanguagePython

An Implicit Alignment for Video Super-Resolution

PWC

vid4_city_full.mp4

This is an offical PyTorch implementation of

An Implicit Alignment for Video Super-Resolution.
[arXiv]
Kai Xu, Ziwei Yu, Xin Wang, Michael Bi Mi, Angela Yao
Computer Vision and Machine Learning group, NUS.

Video super-resolution commonly uses a frame-wise alignment to support the propagation of information over time. The role of alignment is well-studied for low-level enhancement in video, but existing works have overlooked one critical step -- re-sampling. Most works, regardless of how they compensate for motion between frames, be it flow-based warping or deformable convolution/attention, use the default choice of bilinear interpolation for re-sampling. However, bilinear interpolation acts effectively as a low-pass filter and thus hinders the aim of recovering high-frequency content for super-resolution.

This paper studies the impact of re-sampling on alignment for video super-resolution. Extensive experiments reveal that for alignment to be effective, the re-sampling should preserve the original sharpness of the features and prevent distortions. From these observations, we propose an implicit alignment method that re-samples through a window-based cross-attention with sampling positions encoded by sinusoidal positional encoding. The re-sampling is implicitly computed by learned network weights. Experiments show that the proposed implicit alignment enhances the performance of state-of-the-art frameworks on both synthetic and real-world datasets.

A comparison diagram between bilinear interpolation and our implicit alignment. Bilinear interpolation fixes aggregation weight W_bi. Implicit alignment learns affinity through the cross-attention module to calculate the final result. Red grids denote the source frame, purple grids denote the target frame, and blue grids denote the aligned frame.

Results

REDS4 Frames PSNR SSIM Download
PSRT-recurrent 6 31.88 0.8964
IART (ours) 6 32.15 0.9010 model | results
REDS4 Frames PSNR SSIM Download
BasicVSR++ 30 32.39 0.9069
VRT 16 32.19 0.9006
RVRT 30 32.75 0.9113
PSRT-recurrent 16 32.72 0.9106
IART (ours) 16 32.90 0.9138 model | results
Vimeo90k-T Frames PSNR SSIM Download
BasicVSR++ 14 37.79 0.9500
VRT 7 38.20 0.9530
RVRT 14 38.15 0.9527
PSRT-recurrent 14 38.27 0.9536
IART (ours) 7 38.14 0.9528 model
Vid4 Frames PSNR SSIM Download
BasicVSR++ 14 27.79 0.8400
VRT 7 27.93 .8425
RVRT 14 27.99 0.8462
PSRT-recurrent 14 28.07 0.8485
IART (ours) 7 28.26 0.8517 model

Installation

We test the code for python-3.9, torch-1.13.1 and cuda-11.7. Similar versions will also work.

conda create -n IART python==3.9
conda activate IART

pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
pip install -r requirements.txt

Prepare Data:

To prepare the dataset, follow BasicSR. After completing the preparation, the directory structure should be as follows:

datasets/
├──REDS/
│   └──val_REDS4_sharp
│   └──val_REDS4_sharp_bicubic

Testing

Download models and put them under experiments/pretrained_models/

# VSR trained on REDS with 6 input frames, tested on REDS4
CUDA_VISIBLE_DEVICES=0 python test_IART_reds_N6.py

# VSR trained on REDS with 16 input frames, tested on REDS4
CUDA_VISIBLE_DEVICES=0 python test_IART_reds_N16.py

Training

# VSR trained on REDS with 6 input frames, tested on REDS4
bash dist_train.sh 8 options/IART_REDS_N6_300K.yml

# VSR trained on REDS with 16 input frames, tested on REDS4
bash dist_train.sh 8 options/IART_REDS_N16_600K.yml

# video sr trained on Vimeo, validated on Vid4
bash dist_train.sh 8 options/IART_Vimeo_N14_300K.yml

Due to the incompatibility of pytorch checkpoint and distributed training, the training process will terminate after the first 5000 iterations. To resume the training, execute the training script again and the previously saved parameters will be automatically loaded. This will restore the normal training procedure.

Acknowledgment

We acknowledge the following contributors whose code served as the basis for our work:

RethinkVSRAlignment, BasicSR and mmediting.