Did the 'Camera Intrinsic Parameter' predicted from networks?

Question

Did the 'Camera Intrinsic Parameter' predicted from networks?

tlok666 opened this issue 4 years ago · 4 comments

Hi,
I have a question about the 'Camera Intrinsic Parameter' Ks.

In your paper, I think the Camera Intrinsic Parameters were predicted from networks.
And I also found clues in line 210 'https://github.com/TerenceCYJ/S2HAND/blob/main/examples/utils/freihandnet.py'

But when it comes to the loss function. I find you used Camera Intrinsic Parameters 'Ks' from the dataset to project 3D coordinates into 2D.
in line 48 'https://github.com/TerenceCYJ/S2HAND/blob/main/examples/train.py'

I was confused about the projection function. Why would you use the 'Ks' from the dataset? What is the relationship between those two operations?

Best!

Answer 1 · 2021-09-02T11:42:38.000Z

Hi.

The camera parameters (s, R, T) show the position of the hand mesh in camera coordinates. While the intrinsic parameters include the focal length and the optical center, and the intrinsic is used for projecting 3D coordinates into 2D space.

In our work, we use the intrinsic that is provided along with the input RGB image.

Answer 2 · 2021-09-03T12:27:52.000Z

e camera parameters (s, R, T) shows the position of the hand mesh in camera coordinates. While the intrinsic parameters include the focal length and the optical center, and the intrinsic is used for projecting 3D coordinates into 2D space.

Thanks for your reply.

To my understanding, your mentioned camera parameters (s, R, T) only include scale and translation shown below:

Does it matter whether adopting this scale and translation in camera space?

Answer 3 · 2021-09-03T12:58:16.000Z

When using the camera intrinsic to project 3D joints into 2D and supervise the learning in 2D, I think the scale and translation matter. (Although we evaluate aligned results for FreiHAND and HO3D.) And the rotation is used in rot_pose_beta_to_mesh.

BTW, there is another option to get rid of using the camera intrinsic is that use orthogonal projection in [1], but in that case, you don't get the real 3D position in camera space.

[1] Learning Category-Specific Mesh Reconstruction from Image Collections.

Answer 4 · 2021-09-03T13:32:23.000Z

It makes sense. Thank you very much！