sniklaus/softmax-splatting

How do such "flows" warp the frames back accurately?

Closed this issue · 12 comments

I tried to warp the second image back to the first one based on the ground truth flow, from the MPI-Sintel training split, to visualize the effect of the flow filed. From my understanding, the warped image should be similar to the first one. However, I can see that warped one is a kind of overlapped images of both frame1 and frame2.
image
image
Top left: frame 1; Top right: frame 2; Bottom Left: warped frame

Indicated by the papers of yours and PWC-Net. It's easy to understand that the overlapped area is caused by those pixels appearing in the first frame but disappearing in the second frame.

Finally, here is my question:
How could you use such an "inaccurate" flow from PWC-net to warp the frames "accurately"? Do those "inaccurate" flow really help you interpolate?? As you know, MPI-Sintel provides the occlusion mask to handle the problem above so that the final loss could be regularized by that mask. However, interpolation datasets don't have such a mask.

this is forward warping.
for example, in the cave2 folder of the sintel dataset,
the first img is frame_0015.png,
the second img is frame_0016.png,
the flow is frame_0015.flo
There are do not exist an overlap, otherwise, u mistake the Input order.

u want to warp img2 to img1, u should use flow from img2 to img1

As stated by @laomao0, softmax splatting performs forward and not backward warping. Please see the following for a concrete example of using softmax splatting on the Sintel dataset.

tenFirst = torch.FloatTensor(numpy.ascontiguousarray(cv2.imread(filename='./frame_0001.png', flags=-1).transpose(2, 0, 1)[None, :, :, :].astype(numpy.float32) * (1.0 / 255.0))).cuda()
tenSecond = torch.FloatTensor(numpy.ascontiguousarray(cv2.imread(filename='./frame_0002.png', flags=-1).transpose(2, 0, 1)[None, :, :, :].astype(numpy.float32) * (1.0 / 255.0))).cuda()
tenFlow = torch.FloatTensor(numpy.ascontiguousarray(read_flo('./frame_0001.flo').transpose(2, 0, 1)[None, :, :, :])).cuda()

tenMetric = torch.nn.functional.l1_loss(input=tenFirst, target=backwarp(tenInput=tenSecond, tenFlow=tenFlow), reduction='none').mean(1, True)

tenOutputs = [softsplat.FunctionSoftsplat(tenInput=tenFirst, tenFlow=tenFlow * fltTime, tenMetric=-20.0 * tenMetric, strType='softmax') for fltTime in numpy.linspace(0.0, 1.0, 11).tolist()]
npyOutputs = [(tenOutput[0, :, :, :].cpu().numpy().transpose(1, 2, 0) * 255.0).clip(0.0, 255.0).astype(numpy.uint8) for tenOutput in tenOutputs + list(reversed(tenOutputs[1:-1]))]

moviepy.editor.ImageSequenceClip(sequence=[npyOutput[:, :, ::-1] for npyOutput in npyOutputs], fps=15).write_gif('./out.gif')

Which yields the following sequence where bandage_1/clean/frame_0001.png has been forward warped according to the optical flow in bandage_1/flow/frame_0001.flo.

out

this is forward warping.
for example, in the cave2 folder of the sintel dataset,
the first img is frame_0015.png,
the second img is frame_0016.png,
the flow is frame_0015.flo
There are do not exist an overlap, otherwise, u mistake the Input order.

u want to warp img2 to img1, u should use flow from img2 to img1

Do you mean that the flow file in MPI-Sintel should be from frame1->frame2, in other words, warp the frame1 to frame2, right? That's my understanding at first, but the code in PWC-net always warp the second feature map back to the first one. Thus the final result from the model should be frame2->frame1 as well.
image
image
image

Additionally, I tried to warp in both direction:
image
warp(img1, flo)
image
warp(img2, flo)

To be honest, neither of them looks good. The top result is forward warping but doesn't make sense. The bottom result is backward warping and it is easy to be understood what happened though there exists an overlapped area. As the girl runs from left to right, and the weapon moves the same direction as well. The tip of the weapon is warped to left relatively marked in red, while some parts of the leg still exists in the warped frame since warping doesn't restore the occluded background which is marked in blue.

Which yields the following sequence where bandage

Thanks for your reply. I have been already so confused about the direction of the flow right now.
The PWC-net uses back warp during training while generates back warping flow as the result, but you can use that flow to warp forward. Is there any misunderstanding or mistake I did? Please take a look at my reply to laomao

Backward warping is used in PWC-Net to warp the second frame to the first one (the flow is predicted from the first frame to the second frame). For frame interpolation, for example, we might want to forward warp the first frame half way to the second frame to get an estimate of the intermediate frame. Backward warping cannot directly be applied here since it would require the flow from the unknown intermediate frame to the first frame. Your examples do not seem to use our proposed softmax splatting since there are no holes in the output, you might want to give the code that I shared in my earlier comment a try.

It makes sense, but I need to redeclare that all of the images I posted above are nothing about interpolation. They are just the comparison of forward warping and backward warping by gt_ground truth from MPI-Sintel. Actually I am still confused about the direction of the gt_optical_flow.

Additionally, I haven't run your code of softmax splatting, since I was stuck in understanding the usage of pwc-net. Based on your inspiration and the understanding of the MPI-Sintel. The gt_optical flow in MPI-Sintel is always frame2->frame1, and PWC-net predicts the flow frame1->frame2, right?

As a result, softmax splatting uses the optical flow to do forward warping.

Our proposed softmax splatting isn't strictly bound to do interpolation. And all that I am saying is that none of your examples look like forward warping. You might want to take a look at the code that I shared earlier, maybe it also clears up your confusion (the ground truth flow in Sintel is from frame n to frame n + 1).

I see. I misunderstood the concept of the warping approach last week. I have cleared up the confusion about optical flow, since forward and backward warping are different ways to warp and independent on the direction of the flow.

Additionally, I went through your code of softmax splatting. Then I visualize the target of the warped image from this line:
image
image
The backward warped image is still like the overlapped style. Then you calculate the loss between the prediction and the "overlapped" ground truth. I don't think this makes sense.

The code I visualize is as follows:

def backwarp(tenInput, tenFlow):
	if str(tenFlow.size()) not in backwarp_tenGrid:
		tenHorizontal = torch.linspace(-1.0, 1.0, tenFlow.shape[3]).view(1, 1, 1, tenFlow.shape[3]).expand(tenFlow.shape[0], -1, tenFlow.shape[2], -1)
		tenVertical = torch.linspace(-1.0, 1.0, tenFlow.shape[2]).view(1, 1, tenFlow.shape[2], 1).expand(tenFlow.shape[0], -1, -1, tenFlow.shape[3])

		backwarp_tenGrid[str(tenFlow.size())] = torch.cat([ tenHorizontal, tenVertical ], 1).cuda()
	# end

	tenFlow = torch.cat([ tenFlow[:, 0:1, :, :] / ((tenInput.shape[3] - 1.0) / 2.0), tenFlow[:, 1:2, :, :] / ((tenInput.shape[2] - 1.0) / 2.0) ], 1)

	return torch.nn.functional.grid_sample(input=tenInput, grid=(backwarp_tenGrid[str(tenFlow.size())] + tenFlow).permute(0, 2, 3, 1), mode='bilinear', padding_mode='zeros', align_corners=True)
# end

def stitch_images(tenFirst, tenSecond, tenWarped):
    H, W, C = tenSecond.shape
    arr = np.ones((2*H, 2*W, C))
    arr[0: H, 0: W, :] = tenFirst
    arr[H: 2*H, 0:W, :] = tenWarped
    arr[0: H, W: 2*W, :] = tenSecond
    return arr

##########################################################

tenFirst = torch.FloatTensor(numpy.ascontiguousarray(cv2.imread(filename='./images/first.png', flags=-1).transpose(2, 0, 1)[None, :, :, :].astype(numpy.float32) * (1.0 / 255.0))).cuda()
tenSecond = torch.FloatTensor(numpy.ascontiguousarray(cv2.imread(filename='./images/second.png', flags=-1).transpose(2, 0, 1)[None, :, :, :].astype(numpy.float32) * (1.0 / 255.0))).cuda()
tenFlow = torch.FloatTensor(numpy.ascontiguousarray(read_flo('./images/flow.flo').transpose(2, 0, 1)[None, :, :, :])).cuda()

warped = backwarp(tenInput = tenSecond, tenFlow = tenFlow)
trip_imgs = stitch_images(tenFirst[0].cpu.numpy().transpose(1, 2, 0), tenSecond[0].cpu().numpy().transpose(1, 2, 0), warped[0].cpu().numpy().transpose(1, 2, 0))

image.imsave('save your image here')

tenMetric = torch.nn.functional.l1_loss(input=tenFirst, target=backwarp(tenInput=tenSecond, tenFlow=tenFlow), reduction='none').mean(1, True)

The duplication of the front of the car during backward warping is expected. The grassy area in front of the car in the first image is occluded by the car in the second image. So during backward warping, the area is filled in with a duplication of the car because the grass that should have been there isn't visible in the second image (it is occluded by the car). One would need an occlusion mask to address this.

您好,我看到是坐标是上海,所以就用中文和你交流了,请问一下我也存在和你一样的困惑就是为什么大多数工作都用光流去扭曲第二帧而不是第一帧,而且gt是前向流,那么这样扭曲第二帧之后为什么会得到一个靠近第一帧的图片,请问您现在有新的理解了吗?

I have the same question as in this discussion. Have you not figured out this problem yet? Namely, why the optical flow in MPI-Sintel is called "forward", but the only correct solution is obtained if you apply warp from the next frame to the previous one.

I have the same question as in this discussion. Have you not figured out this problem yet? Namely, why the optical flow in MPI-Sintel is called "forward", but the only correct solution is obtained if you apply warp from the next frame to the previous one.

The forward flow is stored according to frame1 coordinates when calculate flow from frame1 to frame2, so you can only do back warp from frame2 to frame1. If you want to do forward warp, you can warp the -flow to frame2 coordinates according to 'forward' flow, then use this warped flow to warp frame1 to frame2. You may try the code below. However, the result is not that good, cause forward warp, do the warp operation twice.

import torch
import torch.nn as nn
from torch.autograd import Variable

def warp_flow_backward(x, flo):
"""
warp an image/tensor (im2) back to im1, according to the optical flow
x: [B, C, H, W] (im2)
flo: [B, 2, H, W] flow
"""
B, C, H, W = x.size()
# mesh grid
xx = torch.arange(0, W).view(1, -1).repeat(H, 1)
yy = torch.arange(0, H).view(-1, 1).repeat(1, W)
xx = xx.view(1, 1, H, W).repeat(B, 1, 1, 1)
yy = yy.view(1, 1, H, W).repeat(B, 1, 1, 1)
grid = torch.cat((xx, yy), 1).float()

if x.is_cuda:
    grid = grid.cuda()
vgrid = Variable(grid) + flo

# scale grid to [-1,1]
vgrid[:, 0, :, :] = 2.0 * vgrid[:, 0, :, :].clone() / max(W - 1, 1) - 1.0
vgrid[:, 1, :, :] = 2.0 * vgrid[:, 1, :, :].clone() / max(H - 1, 1) - 1.0

vgrid = vgrid.permute(0, 2, 3, 1)
output = nn.functional.grid_sample(x, vgrid, align_corners=True)
mask = torch.autograd.Variable(torch.ones(x.size())).cuda()
mask = nn.functional.grid_sample(mask, vgrid, align_corners=True)

mask[mask < 0.9999] = 0
mask[mask > 0] = 1

return output, mask

def warp_flow_forward(x, flo):
"""
warp an image/tensor (im1) back to im2, according to the forward optical flow
x: [B, C, H, W] (im2)
flo: [B, 2, H, W] flow
"""
B, C, H, W = x.size()
# mesh grid
xx = torch.arange(0, W).view(1, -1).repeat(H, 1)
yy = torch.arange(0, H).view(-1, 1).repeat(1, W)
xx = xx.view(1, 1, H, W).repeat(B, 1, 1, 1)
yy = yy.view(1, 1, H, W).repeat(B, 1, 1, 1)
grid = torch.cat((xx, yy), 1).float()

if x.is_cuda:
    grid = grid.cuda()
vgrid = Variable(grid) + flo

# scale grid to [-1,1]
vgrid[:, 0, :, :] = 2.0 * vgrid[:, 0, :, :].clone() / max(W - 1, 1) - 1.0
vgrid[:, 1, :, :] = 2.0 * vgrid[:, 1, :, :].clone() / max(H - 1, 1) - 1.0

vgrid = vgrid.permute(0, 2, 3, 1)
flow_f = nn.functional.grid_sample(-flo, vgrid, align_corners=True)

mask = torch.autograd.Variable(torch.ones(x.size())).cuda()
mask_f = nn.functional.grid_sample(mask, vgrid, align_corners=True)

output, mask_b = warp_flow_backward(x, flow_f)

mask_b[mask_b < 0.9999] = 0
mask_b[mask_b > 0] = 1

return output, mask_f*mask_b