Yuehan Zhang, Angela Yao
National University of Singapore
In this paper, we focus on investigating spatial and channel attention under real-world VSR settings:
- we investigate the sensitivity of two attention mechanisms to degraded queries and compare them for temporal feature aggregation;
- we reveal the high channel covariance of channel attention outputs;
- to validate our findings, we derive RealViformer, a channel-attention-based Transformer for RWVSR, with a simple but improved transformer block design.
Public the repository- Update links to datasets
- Add video results
Python >= 3.9
PyTorch > 1.12
# Clone the repository
git clone https://github.com/Yuehan717/RealViformer.git
# Navigate into the repository
cd RealViformer
# Install dependencies
pip install -r requirements.txt
- Training dataset: REDS; the degradation is added on-the-fly.
- Testing datasets:
- Real-world datasets: VideoLQ, RealVSR
- Synthetic datasets: REDS-test, UDM10; the degradation is synthesized with the same degradation pipeline in training.
As RealViformer focuses on architecture design, we only provide testing scripts. The pretrained model is available here.
python inference_realviformer.py --model_path pretrained_model/weights.pth --input_path [path to video folder] --save_path results/ --interval 100
The code is based on BasicVSR and Restormer. Thanks to their great work!