ishank-juneja/L1-optimal-paths-Stabilization

L1 norm Algorithm Problem

Closed this issue · 7 comments

cmfcm commented

Hi! Firstly, thanks for the great work you've done! I'm currently working on this stabilization algorithm and I have a question relating to your code. In the paper, it seems that F_transform means the relationship between the current frame to the previous frame, which to my understanding, is points_in_previous_frame = F_transform * points_in_current_frame(just like figure 2 in the original paper shown). However, in your code, when you try to estimate the affine transformation, you put prev pts as the source and current pts as the dst(https://github.com/ishank-juneja/Video-Stabilization/blob/797f55d236614bd21bedbcead7a3d0a3ddc0193d/L1-optimal-paths/L1optimal.py#L83), should it be the inverse of that? Also, I notice that when you writing out the video, you apply the optimized transform matrix B directly to the image, which to my understanding is also wrong, because according to the paper, what we have got is the optimized path of the crop box, and image's path should be the inverse of that. I think maybe the reason that why the result is still good is you made the inverse F_transform as I have mentioned before? Sometimes, when I run the code on violently shaking videos, there will be some black borders in the optimized result video, and that brings all the questions above...
Thanks for that and hope we could discuss about it!

Thanks you for your interest and question. I am currently in the middle of another project. I will get back to you on this by Friday Indian time.

cmfcm commented

Thanks you for your interest and question. I am currently in the middle of another project. I will get back to you on this by Friday Indian time.

No worries, and thanks for the quick reply here.

I'll take the 2 issues you have raised one at a time (they might be linked as you have hinted, but I am not sure yet, so lets take them up separately).
Referring to the line-
https://github.com/ishank-juneja/Video-Stabilization/blob/797f55d236614bd21bedbcead7a3d0a3ddc0193d/L1-optimal-paths/L1optimal.py#L83
I completely agree that Figure 2 shows that F_n when applied to C_{n} is supposed to result in C_{n - 1}. Also in the first paragraph of section 2.1, it is stated that

a linear motion model F_t (x) modeling the motion of feature points x from I_t to I_{t−1}

Which means that as you have pointed out, there is a serious bug in the implementation. The reason I got it wrong was that I took the definition of F_t,

C_{t+1} = C_t *F_{t+1}

to mean that it is the transform taking frame n to frame n+1. When in fact it takes the camera path at n to the camera path at n+1, and the definition of the camera path and the inter-frame transforms are "circular" in a way. That is, there is not a one to one map between the frame at the time step n and the camera path at n. This will need to be fixed. Thanks for pointing this out, it must have taken a careful reading.

About the second issue, https://github.com/ishank-juneja/Video-Stabilization/blob/797f55d236614bd21bedbcead7a3d0a3ddc0193d/L1-optimal-paths/L1optimal.py#L111

I agree that how I have applied B_t is not consistent with with the usage in the paper. I had gone ahead with how it is currently since I couldn't quite understand how I would apply it to an oriented crop-box as is hinted in Figure 2 and the current choice seemed to work.
Do you have some ideas as to how it can be fixed? Since my current understanding is that the skewed and properly oriented crop-box arises when the stabilization transform is applied to a crop window. The crop window being a centered crop-rectangle taken out of the frame and of dimensions (r*crop_ratio, c*crop_ratio) where (r, c) is the original aspect-ratio.

cmfcm commented

I'll take the 2 issues you have raised one at a time (they might be linked as you have hinted, but I am not sure yet, so lets take them up separately).
Referring to the line-
https://github.com/ishank-juneja/Video-Stabilization/blob/797f55d236614bd21bedbcead7a3d0a3ddc0193d/L1-optimal-paths/L1optimal.py#L83

I completely agree that Figure 2 shows that F_n when applied to C_{n} is supposed to result in C_{n - 1}. Also in the first paragraph of section 2.1, it is stated that

a linear motion model F_t (x) modeling the motion of feature points x from I_t to I_{t−1}

Which means that as you have pointed out, there is a serious bug in the implementation. The reason I got it wrong was that I took the definition of F_t,

C_{t+1} = C_t *F_{t+1}

to mean that it is the transform taking frame n to frame n+1. When in fact it takes the camera path at n to the camera path at n+1, and the definition of the camera path and the inter-frame transforms are "circular" in a way. That is, there is not a one to one map between the frame at the time step n and the camera path at n. This will need to be fixed. Thanks for pointing this out, it must have taken a careful reading.

About the second issue,

https://github.com/ishank-juneja/Video-Stabilization/blob/797f55d236614bd21bedbcead7a3d0a3ddc0193d/L1-optimal-paths/L1optimal.py#L111

I agree that how I have applied B_t is not consistent with with the usage in the paper. I had gone ahead with how it is currently since I couldn't quite understand how I would apply it to an oriented crop-box as is hinted in Figure 2 and the current choice seemed to work.
Do you have some ideas as to how it can be fixed? Since my current understanding is that the skewed and properly oriented crop-box arises when the stabilization transform is applied to a crop window. The crop window being a centered crop-rectangle taken out of the frame and of dimensions (r*crop_ratio, c*crop_ratio) where (r, c) is the original aspect-ratio.

I think for the second issue, you can try to apply B_t on the crop box, which means that firstly, you can transform four corner points of the crop box, then use these four new points and the four corner points of the original frames((0,0), (0, frame_height), (frame_width, 0), (frame_width, frame_height)) to estimate a new affine/homography transform(e.g. using function like cv::estimateAffine2D). Finally, as you have done here, apply warpAffine on the original frame by the transformation you've just estimated.

Thanks for the suggestions @cmfcm , I'll try to fix these soon after debugging and testing. In the meantime, feel free to create a pull request if you have already made the changes locally or do so in future.

cmfcm commented

Sorry for closing the issue incorrectly, I just came to see if there's any new idea. I can share my current solution here(it's different from what I've mentioned before) and maybe you can review whether it's correct.
Before that, I observe that maybe there's a bug in the code, which in line:
https://github.com/ishank-juneja/Video-Stabilization/blob/797f55d236614bd21bedbcead7a3d0a3ddc0193d/L1-optimal-paths/L1optimal.py#L159
It seems that you did not pass the crop ratio when doing stabilization, which will result in an error when getting the crop box corners since the crop ratio will always be the default ratio 0.8.
Then related to my previous two issues, for the first one, I think it could be realized simply be changing https://github.com/ishank-juneja/Video-Stabilization/blob/797f55d236614bd21bedbcead7a3d0a3ddc0193d/L1-optimal-paths/L1optimal.py#L83 to m, _ = cv.estimateAffine2D(curr_pts, prev_pts)
For the second issue, my suggestion is to revise the write_out function https://github.com/ishank-juneja/Video-Stabilization/blob/797f55d236614bd21bedbcead7a3d0a3ddc0193d/L1-optimal-paths/L1optimal.py#L96 like that:

def write_output(cap, out, B_transforms, shape, crop_ratio):
    # Reset stream to first frame
    cap.set(cv.CAP_PROP_POS_FRAMES, 0)
    n_frames = B_transforms.shape[0]
    # Write n_frames transformed frames
    for i in range(n_frames):
        # Read the first/next frame
        success, frame = cap.read()
        # cv.imshow("Before and After", frame)
        # cv.waitKey(10)
        # If there is not next frame to read, exit display loop
        if not success:
            break
        # Apply affine wrapping to the given frame
        # Also convert to sta
        scale_x = 1 / crop_ratio
        scale_y = 1 / crop_ratio
        
        scaling_matrix = np.eye(3, dtype=float)
        scaling_matrix[0][0] = scale_x
        scaling_matrix[1][1] = scale_y

        shifting_to_center_matrix = np.eye(3, dtype=float)
        shifting_to_center_matrix[0][2] = -shape[0] / 2.0
        shifting_to_center_matrix[1][2] = -shape[1] / 2.0

        shifting_back_matrix = np.eye(3, dtype=float)
        shifting_back_matrix[0][2] = shape[0] / 2.0
        shifting_back_matrix[1][2] = shape[1] / 2.0

        B_matrix = np.eye(3, dtype=float)
        B_matrix[:2][:] = B_transforms[i, :, :2].T

        final_matrix = shifting_back_matrix @ scaling_matrix @ shifting_to_center_matrix @ np.linalg.inv(B_matrix)

        frame_stabilized = cv.warpAffine(frame, final_matrix[:2, :], shape)
        # frame_stabilized = cv.warpAffine(frame, B_transforms[i, :, :2].T, shape)
        # Write the frame to the file
        frame_out = cv.hconcat([frame, frame_stabilized])
        # If the image is too big, resize it.
        # if frame_out.shape[1] > 1920:
        # frame_out = cv.resize(frame_out, (frame_out.shape[1], frame_out.shape[0]))
        # Display the result in a window before writing it to file
        cv.imshow("Before and After", frame_out)
        cv.waitKey(10)
        out.write(frame_out)
    return

which means that we first pre-multiply the inverse of B_t since we B_t now indicates the optimized path for the crop box, and the inverse of it is actually how we should "move" our images. Besides that, we also need to do the "cropping", so I first shift the center of the "moved" image to the origin, then scaling it with the ratio 1/crop_ratio(this is the "cropping" operation to my understanding), finally shifting to back. Also, frame_limits is no longer required here, just fix all the other places which uses this variable will be fine.
You could take this just as a reference, maybe it's not the best solution but it works for me. @ishank-juneja

@cmfcm Thanks a lot for the fixes, changes incorporated in commit 10256a6