Tangshitao/MVDiffusion

Question about Panorama Homography Matrix Computation

kkaiwwana opened this issue · 2 comments

Hi there, I'm confused for the homography matrix computation in function pano/utils/get_correspondence, which looks like:

# ...
R_left = R_left.reshape(-1, 3, 3)
R_right = R_right.reshape(-1, 3, 3)
K_left = K_left.reshape(-1, 3, 3)
K_right = K_right.reshape(-1, 3, 3)

homo_l = (K_right@torch.inverse(R_right) @
        R_left@torch.inverse(K_left))


xyz_l = torch.tensor(get_x_2d(img_h, img_w),
                    device=R.device)
xyz_l = (
    xyz_l.reshape(-1, 3).T)[None].repeat(homo_l.shape[0], 1, 1)
# ...

As far as I understand that this part of code is computing a homography matrix based on camera K & R since you name it as 'homo_l', and utilize it as a homography as well in following codes. But I didn't understand why it is computed in this way, as I find something similar but actually different in Stack Overflow - Compute Homography Matrix based on intrinsic and extrinsic camera parameters.

According to that, formulation of computing homography from Cam_2 to Cam_1 is H = K2 * R_2_1 * inv(K1) where R_2_1 = R_2_0 * R_1_0.transposed.

That's different with your version. So could you can provide some references to your formulation or just simply explain it?

I didn't see any difference between my implementation and the one you refer

I think there exists a minor issue. I think it should be homo_l = (K_right @ R_right @ torch.inverse(R_left) @torch.inverse(K_left)) instead of your current implementation.

For example:
Take a pixel in camera 1: p_1 = (x, y, 1) in homogeneous coordinates
Back project it into a ray in 3D space: P_1 = inv(K_1) * p_1
Decompose the ray in the coordinates of camera 2: P_2 = R_2_1 * P1
Project the ray into a pixel in camera 2: p_2 = K_2 * P_2
Put the equations together: p_2 = [K_2 * R_2_1 * inv(K_1)] * p_1 and R_2_1 should be R_2 * R_1.transposed

Do you agree?