sniklaus/softmax-splatting

Implementation details on stable softmax splatting

fmu2 opened this issue · 1 comments

fmu2 commented

Hi Simon,

I came across your recent paper on splatting-based synthesis for video frame interpolation and thoroughly enjoyed reading it. I am wondering if your could further clarify some implementation details of the stable softmax splatting discussed in Section 4.1.

(1) Instead of directly splatting source pixels to a target frame, you backward warp the pixels using splatted forward flow. Do you fill the holes in the derived backward flow map using the outside-in strategy described in the two papers you cited? If so, how do you ensure the hole-filling step is differentiable?

(2) The 2x2 bilinear splatting kernel is replaced by a 4x4 Gaussian kernel in order to enlarge the support of source pixels. My guess is if the center of a pixel is within two pixel units from the splat destination, then that pixel is inside the splatting kernel. Please correct me if I am wrong. How do you compute the interpolation weights for the pixels in the kernel? I assume those weights will replace the bilinear weights b(.) in Equation (3). Specifically, what is the standard deviation for the Gaussian kernel, and do you normalize the weights so that they sum to one?

Looking forward to your reply. Thank you!

Thank you for your interest in our work!

  1. Yes, we splat inverse flows and then backward warp the colors. We do not fill in any holes in the splatted inverse flows. As such, some of the backward warped pixels will be invalid. We also tried splatting not only inverse flows but also an auxiliary channel of only ones and then masking out backward warped pixels by the splatted auxiliary channel such that holes in the splatted flow will have a merge metric of zero (aside from being black instead of having an invalid color), but that didn't lead to any improvements. We didn't try outside-in filling as done in the Middlebury flow paper since it is unclear how to make this operation sufficiently fast. Specifically, one has to call the outside-in filling potentially many times since each invocation only fills a single slice of pixels at the boundary of the holes.
  2. Let's consider the one-dimensional case and a pixel splats to 4.7 then we will sample from 3, 4, 5, 6 and we use g(3-4.7), g(4-4.7), g(5-4.7), g(6-4.7) as the sampling weights. For the two-dimensional case we use the euclidean distance between the point we splat to and the point we sample from. We use a sigma of 0.6 which we determined through a hyperparameter sweep (though the common (ksize-1)/6 formula would suggest 0.5). We tried normalizing and not normalizing the weights, both worked similarly well so you should be fine to go with the easier solution of not normalizing them.

Hope that clears things up!