hzwer/ECCV2022-RIFE

flow computation details

Closed this issue · 4 comments

In the code IFNet_m.py or IFNet.py, there is this line for flow computation in the IFBlock forward():
flow = tmp[:, :4] * scale * 2
What is the purpose of multiplication to scale * 2?

hzwer commented

When resizing an optical flow map, the value should be scaled accordingly.
The model can learn this constant, but in principle, I think it is more correct to write it out explicitly.

Thanks for the explanation, that is clear.

I have another question related to flow scaling for arbitrary interpolation. I found two different implementations in your code.
I found this in the inference_video.py file:

  if n == 1:
      return [middle]
      
  first_half = make_inference(I0, middle, n=n//2) # the floor division // rounds the result down to the nearest whole number
  second_half = make_inference(middle, I1, n=n//2)
  if n%2:
      return [*first_half, middle, *second_half]
  else:
      return [*first_half, *second_half]

which is one way of performing interpolation and arbitrary rates, but this approach is suitable only for rates that are powers of 2 since n equals to 2**args.exp-1.

Another implementation that I found in the forward method of IFNet in the IFNet_m.py file is the following:
timestep = (x[:, :1].clone() * 0 + 1) * timestep
and then, this 'timestep' is used as additional input to the IFBlocks:
flow_d, mask_d = stu[i](torch.cat((img0, img1, timestep, warped_img0, warped_img1, mask), 1), flow, scale=scale[I])
However, I don't see explicitly that it was used this way in your paper, and I am not sure what is the reason to perform it this way.

I experimented by myself adding the parameter 't' for arbitrary interpolation. I first tried to add it for the interpolation equation (equation 1 in your paper). This interpolation is implemented in the following way:
merged[i] = merged[i][0] * mask_list[i] + merged[i][1] * (1 - mask_list[I])
and I modified it to:
merged[i] = merged[i][0] * mask_list[i] * (1 - t) + merged[i][1] * (1 - mask_list[i]) * t
according to the simple 1D piecewise linear interpolation. Of course, that didn't yield any good results, since we cannot achieve arbitrary interpolation by simply scaling the intermediate densities which are the results of the backward warping.
After that, I added scaling 't' before applying warping in the following way:

warped_img0 = warp(img0, flow[:, :2] * t) # Ft->0
warped_img1 = warp(img1, flow[:, 2:4] * (1 - t)) # Ft->1

note that here I multiply Ft->0 to t, and Ft->1 to (1 - t) and not vice versa because of the backward warping operation. That resulted in better performance. But still, I am not sure if this is the optimal way to perform arbitrary interpolation.

I want to ask you, in case RIFE supports arbitrary interpolation at any rates which are not necessarily powers of two, how did you perform this?

hzwer commented
image image I directly input t into the convolutional network, and the model can achieve good results. The recursive implementation is an early demo version. During training, t has some randomness, not limited to powers of two. image

I see, thanks for explanations. And do I understand correctly that this is implemented as the following (the forward method of IFNet in the IFNet_m.py file):

timestep = (x[:, :1].clone() * 0 + 1) * timestep
...
flow_d, mask_d = stu[i](torch.cat((img0, img1, timestep, warped_img0, warped_img1, mask), 1), flow, scale=scale[I])