neu-vi/PlanarRecon

Query regarding use of *world_to_aligned_camera* transformation

Closed this issue · 2 comments

Hi @ymingxie

Thanks for sharing the great work!

Could you please help me understand the use of world_to_aligned_camera transformation:

data['world_to_aligned_camera'] = torch.from_numpy(rotation_matrix4x4).float() @ middle_pose.inverse()

  • Whats the motivation of using the middle camera pose ? And what's the need of xy plane alignment as defined in rotate_view_to_align_xyplane ?

    def rotate_view_to_align_xyplane(self, Tr_camera_to_world):

  • As per the paper, I thought all the fusion is done at the world coordinate system. Why is then the sparse conv 3D backbone created at the aligned camera (middle camera pose) coordinate system ?

    # ----sparse conv 3d backbone----

Looking forward to hearing from you soon.

Thanks & Best Regards
Shivam

Hi Shivam,

Thanks for your interest in our work!

  1. It would be better to predict planes/geometry in a local coordinate for each fragment. I choose the middle camera coordinate as the local coordinate.
  2. The rotate_view_to_align_xyplane is used to create a gravity-aligned coordinate based on the local coordinate (middle camera coordinate). Most planes are parallel or perpendicular to the gravity direction. I leverage this prior and predict the planes/geometry in this gravity-aligned coordinate.
  3. The gru fusion (following NeuralRecon) and sparse convolution are done in gravity-aligned coordinates. Before the gru fusion, the global hidden state will be transformed to local gravity-aligned coordinates. After the gru fusion, the updated global hidden state will be transformed back to world coordinate.

Thanks a lot @ymingxie this helped, its clear now!