apple/ml-hypersim

Getting opencv-style camera intrinsics

hwjiang1510 opened this issue · 7 comments

Thanks for the great work!

Could you give me some instructions on how to get the opencv-style camera intrinsics from this file?
By saying opencv-stype camera intrinsics, it can be defined as a 3x3 matrix as
[[fx, 0, cx], [0, fy, cy], [0, 0, 1]],
where fx and fy are focal length for the x-axis and y-axis respectively (in pixel), and cx, cy define the principle point of the camera (in pixel). The intrinsics is used for 3D-to-2D projection.

Best,
Hanwen

Hi Hanwen, thanks for your note. What exactly do you want your 3x3 matrix to do? Can you describe the exact mathematical constraint that your matrix is supposed to enforce?

Sure. If I have mesh vertices in 3D, denoted as V with shape [N,3], where N is number of points. The points are defined in the camera coordinate system. Then I want to project the 3D points to 2D using the intrinsics, as

# Project the points using the intrinsic matrix
V_2D_homogeneous = intrinsic_matrix @ V.T

# Convert back to Cartesian coordinates by dividing by the third (homogeneous) coordinate
V_2D = V_2D_homogeneous[:2, :] / V_2D_homogeneous[2, :]

# Transpose to get a shape of (N, 2)
V_2D = V_2D.T

I would like to know the how to get intrinsic_matrix for performing the 3D to 2D projection.

And what exactly is V_2D supposed to be?

V_2D are projected mesh vertices location in 2D on screen.

In normalized floating point coordinates ranging from [-1.0,1.0], or in integer coordinates ranging from [0,w] and [0,h]?

The reason I'm asking you all of these questions is because I want you to write out the complete image formation model you have in your head. Otherwise we will be going around in circles trying to establish a common set of conventions, and we won't know where to start when things aren't behaving as expected.

Yes, I appreciate this style!

The points are in integer coordinates in [0,w] and [0,h].

Great, thank you.

  • Have a look at how we project world-space points into image-space here. This source file implements the basic rasterizer that we used to generate Figure 6 in the paper. More specifically, this source file includes code to take 3D points in world-space and project them to integer screen pixels. We use OpenGL conventions in our code, not OpenCV conventions, but the two sets of conventions should be mathematically equivalent.

  • As a sanity check, start with a scene that doesn't have any weird tilt-shift camera parameters, like ai_001_001.

  • Take one of our position images, and treat it as a collection of world-space points, where the 2D integer pixel coordinates of each world-space point in the collection is known. Use our existing code to project these world-space points to their 2D integer pixel coordinates. The 2D integer pixel coordinates you get from this approach should match the known pixel coordinates of each point with very low error.

  • Now see if you can re-express our OpenGL conventions in the OpenCV conventions you're expecting. If you've done this correctly, you should still be able project world-space points to their known 2D integer pixel coordinates with very low error. Please post here when you have done so, please include a self-contained code snippet so other readers can benefit from your efforts, and please include evidence that your code snippet does what you think it does.

  • Once you've done this, you are ready to move onto one of our scenes with weird tilt-shift camera parameters. Each of these scenes has a modified projection matrix, which can be used in place of M_proj in scene_generate_images_bounding_box.py. You should still be able to re-express this modified projection matrix in terms of your OpenCV conventions. However, due to the tilt-shift camera model, I think some of the entries that you have set to 0 in your 3x3 matrix will end up being non-zero. Please post here when you have done so.