sniklaus/softmax-splatting

about the stability of forward warping

Closed this issue · 3 comments

Hi. I find that, compared with backward warping, forward warping suffers more training unstability. But I think these two types of warping are the same essentilly. Is there any explanation about why forward warpign is more unstable in terms of training? Thx!

I have three thoughts on what may cause the instability.

First, consider the 1D case with an optical flow vector of 10.000000001, the respective pixel will be splatted to two neighboring pixels with the bilinear splatting weight of 0.999999999 and 0.000000001 which is not ideal due to numerical instabilities for the pixel that was splatted with a weight of 0.000000001 (not only in the forward pass but also during backprop).

Second, when dividing the splatted input by the splatted importance metric, one needs to account for divisions by zero (due to the sparse nature of the splatted result as well as my previous point about the bilinear splatting coefficients potentially being tiny). The initial version of this repository addressed this issue by adding an epsilon to the denominator but @holynski suggested an alternative approach in #23 which is now the default (side note, Aleksander used softmax splatting in his awesome "Animating Pictures with Eulerian Motion Fields" paper which according to the project page will soon be open source if you want to take a look how he made softmax splatting work well for his project). You could give the initial approach a try that used the epsilon approach and see what works better for you.

Third, for softmax splatting in particular, the importance metric tenMetric should ideally be within the range of -10 to 10 or so since tenMetric.exp() will otherwise be tiny/huge which may cause numerical instabilities. Consider tiny values and with reference to my previous point when resolving divisions by zero by adding an epsilon, an epsilon of 0.0000001 is far greater than the result of using an importance metric value of (-20).exp() == 0.0000000020612 which is not ideal.

Hi, thx for your informative reply. I notice that in Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes (of which you are second author), the author uses "average" mode. Is that because "average" mode is empirically better than "softmax" mode in that project?

I have been waiting for the release of "Animating Pictures with Eulerian Motion Fields" for a while :)

That is because NSFF renders the image in a wavefront manner as follows: https://github.com/zhengqili/Neural-Scene-Flow-Fields/blob/9833fedbe6ad9557b444cdead7168a3f79a69617/nsff_exp/render_utils.py#L155-L166

Specifically, the splatting is performed multiple times and these results are then merged based on the splatted alpha. You can think of it as rendering an MPI, which makes it easier to handle semi-transparent objects/surfaces.