The code is inconsistent with the paper

Question

The code is inconsistent with the paper

Junda24 opened this issue 2 years ago · 7 comments

Thank you for your excellent work. However, I have a doubt. According to the statement of your paper, the depth assumption of SRC image is used to obtain the similarity score by projection transformation to ref image. However, in Code, the depth assumption is set in the Ref Image coordinate system to obtain 3D points, which are transformed to the SRC image coordinate system and sampled from the SRC image. Finally, the similarity score is obtained by dot product with the REF image. This is actually helpful to get the depth of the ref image, instead of SRC image. I don't understand why you make the depth assumption in the Ref image coordinate system and then project it.

Answer 1 · 2022-07-13T15:03:24.000Z

Hi @cjd24-coder, sorry for the confusion. I realize the variable naming(ref,src) in the code is flipped from the paper. We indeed make the depth assumption in the coordinate system of the image that we want to predict depth.

Answer 2 · 2022-07-18T03:40:14.000Z

Hi @cjd24-coder, sorry for the confusion. I realize the variable naming(ref,src) in the code is flipped from the paper. We indeed make the depth assumption in the coordinate system of the image that we want to predict depth.

Answer 3 · 2022-07-18T03:44:44.000Z

Hi @cjd24-coder, sorry for the confusion. I realize the variable naming(ref,src) in the code is flipped from the paper. We indeed make the depth assumption in the coordinate system of the image that we want to predict depth.

but the proj matric in homo_warping should be: proj = torch.matmul(ref_proj, torch.inverse(src_proj)) if you flipped from the paper. If I understand the formula for homo warping correctly, the formula for proj should correspond to the transformation.

Answer 4 · 2022-07-21T11:37:45.000Z

Hello author, I want to know how you get this accuracy, in the case of projection transformation error. Secondly, after I fixed your projection bug, the ABS_rel error in another Dense Depth for Autonomous Driving (DDAD) dataset was reduced from 20% to 12%. However, monocular single-view networks can also achieve 13% accuracy on this data set. Do you have any tuning skills for MVS2D? Can you tell me? Thank you

Answer 5 · 2022-07-21T11:39:04.000Z

I think mVS2D performance should be more than one percent more accurate than single-view networks

Answer 6 · 2022-07-21T12:24:18.000Z

Hi @cjd24-coder , I don't think the proj matric calculation in homo_warping is wrong in homo_warping. It should be just proj = torch.matmul(src_proj, torch.inverse(ref_proj)), given that ref_proj is the world2cam projection matrix of ref frame(the frame we want to predict depth)

Answer 7 · 2022-08-03T07:45:12.000Z

sorry, i misunderstand.