codeslake/RefVSR

Train and test LR/reference size is different

haewonc opened this issue · 4 comments

First of all, thanks for your great work. Your paper was intereseting, and results were great!
I was trying to use your code, especially datasets.py and get_patch method, but faced one problem.

In the train time (cropped)

  • LR_UW size: (64, 64)
  • LR_REF_W size: (128, 128)

In the test time

  • LR_UW size: (480, 270)
  • LR_REF_W size: (480, 270)

I understand that it is because of the cropping done in get_patch. For the W reference images, I found that your code gets twice a larger patch than UW images. However, my concerns is that why ratio of reference image and LR image is different during train time and test time. More precisely,

  1. Is the ratio of reference image and LR image are intended to be different during train and test time?
  2. Then how do your model handle such different ratio?
  3. If not intended, which is right? Or is there anything I missed?

I'm using your default config, and flag_HD_in is false. Thank you :)

Hi, @haewonc.
Thanks for your interest in our work.

  1. Is the ratio of reference image and LR image are intended to be different during train and test time?

The test set of the RealMCVSR dataset contains video clips with varying orientation (either 480x270 or 270x480).

  1. Then how do your model handle such different ratio?
  2. If not intended, which is right? Or is there anything I missed?
  • A CNN can handle images and videos that have aspect ratio different from the ratio used for training (Think of CNN as a giant convolutional filter.) However, the network will have an effective receptive field limited within the patch size used during training.
  • Almost every deep learning model is trained based on cropped patches, then tested on images/videos in original resolution.

Oh, thank you for fast response.

I think there was some confusion in my question.
The ratio I meant was not aspect ratio of the image itself, but the ratio between LR and reference.

In train time, W reference image patch is twice larger than the UW LR. In test time, their size is same.

I thought it may affect the FoV and the network architecture.

I see. Sorry for the confusion.

First, note that the spatial size of ultrawide, wide-angle, and telephoto frames are the same.
During training, we crop patches of each ultrawide, wide-angle, and telephoto frames and make sure the patches contain similar content, which results in patches of different sizes.
However, during test time, we feed the network with LR and Ref frames in original sizes.
And this does not matter, as our network uses correlation-based reference matching.

I would recommend you this repo for better understanding.
Our reference matching module is based on theirs.

Thanks!

I understand. Thank you!