facebookresearch/projectaria_tools

How to convert ARIA's camera coordinate system to COLMAP's camera coordinate system?

Opened this issue · 6 comments

We want to utilize given ARIA's camera pose(extrinsic matrix) to implement sparse reconstruction in COLMAP. However we found that simply convert the rotation matrix and translation matrix to quaternions input to COLMAP doesn't work. We wonder if they have different coordinate systems and how to convert between them?

Hello, I'm trying to do the same thing. Have you solved the problem? Any help would be appreciated.

ARIA's camera coordinate system seems like OpenGL coordinate system. I referred to conversion method in OpenGL and solved it.

Thanks for your quick reply. I'll have a try. But do you know why it's the case? According to the description here, the ARIA's local coordinate system should be the same as COLMAP's convention.

Sorry I might mislead you, I reversed the second row and the third row of the transformation matrix and turned it to quaternion to initialize the extrinsic matrix in colmap. I referred to conversion code in NeRF and 3DGS and it works for DTC dataset. Other dataset might be different and the document you referred confused me as well.

Really appreciate your quick response. I referred to this documentation page to show that the camera coordinate system convention in Project Aria Glasses is the same as COLMAP's convention, namely Z forward, Y downward, X rightward. Therefore, the RGB camera extrinsic obtained with the following psedo-code should be compatible with the COLMAP extrinsic (except in different forms like rotation matrix and quaternion):

T_World_Device = AriaDigitalTwinDataProvider.get_aria_3d_pose_by_timestamp(some_timestamp)
T_Device_RgbCamera = AriaDigitalTwinDataProvider.get_aria_T_Device_Camera("214-1")
T_World_RgbCamera = T_World_Device * T_Device_RgbCamera
T_RgbCamera_World = T_World_RgbCamera.inverse()

I'm also not sure if the T_RgbCamera_World is the same transformation matrix you are referring to in your previous reply and I don't understand why you ever need to reverse the second and third line (row).

I guess that I found where the question is! The camera coordinate system, as you mentioned, follows Z forward, Y downward, X rightward. But the world coordinate system follows X rightward, Y upward, Z backward, you could check it in session "3D Coordinate frame conventions", which has a picture of glasses and coordinate demonstrations. On the lower right corner of the picture is the world coordinate. So reversing the second and third row seems to be the correct transformation.