sniklaus/softmax-splatting

Optical flow estimation between two images

compressionist opened this issue · 1 comments

Hi @sniklaus,

I very much liked your approach. We think about how to improve interpolating UHD frames after using one of the state of the art algorithms such as RIFE or ST-MFNet. Unfortunately, even using advanced interpolation algorithms, artifacts still emerge (see attached screenshot), especially on small objects. And I think it will be possible to reduce artifacts with your method's help. So, while the question arose, how do you do optical flow estimation? Can you provide your code or a complete description of the steps I should do to get optical flow between two images to be compatible with your algorithm?

Many thanks for considering my request!
Screen Shot 2022-05-26 at 1 35 41 PM

Thank you for your interest in our work!

Small objects subject to motion are, as far as I know, an unsolved problem of optical flow estimation. Specifically, all optical flow methods that I am aware of do the estimation at a lower scale (or at a really low scale with coarse-to-fine flow estimation) where small/thin objects essentially are invisible. Imagine you have an object of size 32 pixels and use PWC-Net to estimate the flow, PWC-Net starts by downscaling the images 6 times before performing the coarse-to-fine flow estimation. As such, the object of size 32 pixels is now 32/2^6==0.5 pixels and hence essentially invisible.

RIFE downscales the image 3 times, so the problem is less pronounced but it is still there. However, you might be hitting another problem, the official models were not trained on high-resolution footage. You might not see good results because the motion in your samples may be greater (in terms of pixels) than what the optical flow network was trained on. For example, consider training an optical flow method on FlyingThings3D which has a resolution of 960x540 and an average motion magnitude of X pixels. If you used this trained optical flow estimator on a 4K version of FlyingThings3D, which has an average motion magnitude of 4*X pixels, you will probably get poor results.

For high-resolution footage, I am under the impression that your best bet is doing iterative flow upsampling. For more information, see Section 5 of "Splatting-based Synthesis for Video Frame Interpolation". It will still struggle with thin objects though.