DeepFovea++: Reconstruction and Super-Resolution for Natural Foveated Rendered Videos

This repository tries to solve the task of fovea sampled reconstruction and video super resolution with partly based on the architecture of the Deep Fovea paper by Anton S. Kaplanyan et al. (facebook research). [1]

Our final paper can be found here.

TODO

Add pade activation unit implementation
Prepare YouTube-M8 used in DeepFovea paper
Adopt implementation to match the reproduce the original DeepFovea
Run rest with original DeepFovea model
Implement axial-attention module
Implement standalone learnable convex upsampling module
Use RAFT for optical flow estimation instead of PWC-Net
Run first test

Model Architecture

[1]

To reach the desired super-resolution (4 times higher than the input) two additional blocks are used, in the end of the generator network. This so called super-resolution blocks are based on two (for the final block three) deformable convolutions and a bilinear upsampling operation. [1]

Losses

We applied the same losses as the paper, but we also added a supervised loss. For this supervised loss, the adaptive robust loss function by Jonathan T. Barron was chosen. [1, 4]

$L_{G}=w_{sv}\cdot L_{sv} + w_{LPIPS}\cdot L_{LPIPS} + w_{flow}\cdot L_{flow} + w_{adv}\cdot L_{adv} + w_{adv, fft}\cdot L_{adv, fft}$

We compute each loss independent, since otherwise we run into a VRAM issue (Tesla V100 16GB).

Dependencies

This implementation used the adaptive robust loss implementation by Jonathan T. Barron. Furthermore, deformable convolutions V2 are used in the generator network. Therefor the implementation of Dazhi Cheng is utilized. For the PWC-Net the implementation and pre-trained weights of Nvidia Research is used. Additionally the PWC-Net and the flow loss implementation depends on the correlation, and resample package of the PyTorch FlowNet2 implementation by Nvidia. To install the packages run python setup.py install for each package. The setup.py file is located in the corresponding folder. [4, 2, 6, 5]

All additional required packages can be found in requirements.txt. To install the additional required packages simply run pip install -r requirements.txt.

Full installation

git clone https://github.com/ChristophReich1996/Deep_Fovea_Architecture_for_Video_Super_Resolution
cd Deep_Fovea_Architecture_for_Video_Super_Resolution
pip install -r requirements.txt
python correlation/setup.py install
python resample/setup.py install
git clone https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch
cd Deformable-Convolution-V2-PyTorch
git checkout pytorch_1.0.0
python setup.py build install

Usage

To perform training, validation, testing or inference just run the main.py file with the corresponding arguments show below.

Argument	Default value	Info
`--train`	False	Binary flag. If set training will be performed.
`--val`	False	Binary flag. If set validation will be performed.
`--test`	False	Binary flag. If set testing will be performed.
`--inference`	False	Binary flag. If set inference will be performed.
`--inference_data`	""	Path to inference data to be loaded.
`--cuda_devices`	"0"	String of cuda device indexes to be used. Indexes must be separated by a comma.
`--data_parallel`	False	Binary flag. If multi GPU training should be utilized set flag.
`--load_model`	""	Path to model to be loaded.

Results

We sampled approximately 19.7% of the low resolution (192 X 256) input image, when apply the fovea sampling. This 19.7% corresponds to 1.2% when compared to the high resolution (768 X 1024) label. Each sequence consists of 6 consecutive frames.

Results of the training run started at the 02.05.2020. For this training run the recurrent tensor of each temporal block was reset after each full video.

Low resolution (192 X 256) fovea sampled input image

High resolution (768 X 1024) reconstructed prediction of the generator

High resolution (768 X 1024) label [3]

Results of the training run started at the 04.05.2020. For this training run the recurrent tensor of each temporal block was not reset after each full video.

Low resolution (192 X 256) fovea sampled input image

High resolution (768 X 1024) reconstructed prediction of the generator

High resolution (768 X 1024) label [3]

Table the validation results after approximately 48h of training (Test set not published yet)

REDS Dataset	L1↓	L2↓	RSNR↑	SSIM↑
DeepFovea++ (reset rec. tensor)	0.0701	0.0117	22.6681	0.9116
DeepFovea++ (not reset rec. tensor)	0.0610	0.0090	23.8755	0.9290

The visual impression, however, leads to a different result. The results form the training run where the recurrent tensor is reset after each full video seen more realistic.

The corresponding pre-trained models, additional plots and metrics can be found in the folder results.

We also experimented with loss functions at multiple resolution stages. This, however, has led to a big performance drop. The corresponding code can be found in the experimental branch.

References

[1] @article{deepfovea,
    title={DeepFovea: neural reconstruction for foveated rendering and video compression using learned statistics of natural videos},
    author={Kaplanyan, Anton S and Sochenov, Anton and Leimk{\"u}hler, Thomas and Okunev, Mikhail and Goodall, Todd and Rufo, Gizem},
    journal={ACM Transactions on Graphics (TOG)},
    volume={38},
    number={6},
    pages={1--13},
    year={2019},
    publisher={ACM New York, NY, USA}
}

[2] @inproceedings{deformableconv2,
    title={Deformable convnets v2: More deformable, better results},
    author={Zhu, Xizhou and Hu, Han and Lin, Stephen and Dai, Jifeng},
    booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
    pages={9308--9316},
    year={2019}
}

[3] @InProceedings{reds,
    author = {Nah, Seungjun and Baik, Sungyong and Hong, Seokil and Moon, Gyeongsik and Son, Sanghyun and Timofte, Radu and Lee, Kyoung Mu},
    title = {NTIRE 2019 Challenge on Video Deblurring and Super-Resolution: Dataset and Study},
    booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month = {June},
    year = {2019}
}

[4] @inproceedings{adaptiveroubustloss,
    title={A general and adaptive robust loss function},
    author={Barron, Jonathan T},
    booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
    pages={4331--4339},
    year={2019}
}

[5] @inproceedings{flownet2,
    title={Flownet 2.0: Evolution of optical flow estimation with deep networks},
    author={Ilg, Eddy and Mayer, Nikolaus and Saikia, Tonmoy and Keuper, Margret and Dosovitskiy, Alexey and Brox, Thomas},
    booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
    pages={2462--2470},
    year={2017}
}

[6] @inproceedings{pwcnet,
    title={Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume},
    author={Sun, Deqing and Yang, Xiaodong and Liu, Ming-Yu and Kautz, Jan},
    booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
    pages={8934--8943},
    year={2018}
}

[7] @article{molina2019pad,
        title={Pad$\backslash$'e Activation Units: End-to-end Learning of Flexible Activation Functions in Deep Networks},
        author={Molina, Alejandro and Schramowski, Patrick and Kersting, Kristian},
        journal={arXiv preprint arXiv:1907.06732},
        year={2019}
}```

ChristophReich1996/DeepFoveaPP_for_Video_Reconstruction_and_Super_Resolution