nv-nguyen/gigapose

Questions about the paper

Closed this issue · 3 comments

Thanks for your great work!

I am reading paper. In section 3.3 said " To recover the re�maining 2 DoFs, scale s and in-plane rotation α, we train deep networks to directly regress these values from a single 2D-2D correspondence. Since the feature extractor Fae is invariant to in-plane rotation and scaling, the corresponding features cannot be used to regress those values, hence we have to train another feature extractor we call Fist".

why? A simple Image regestring method based on SIFT descriptors, and SIFT descriptors is invariant to in-plane rotation and scaling.

I see. You use feature to regress scale s and in-plane rotation α.
another question, why not use correspondences and RANSAC to get scale s and in-plane rotation α, translation t like a simple image regestring method based on SIFT descriptors, Since you have got the the correspondences from Fae ?

Thanks for your interest!

We show in Table 4 the ablation study with different ways to predict the scale, and in-plane rotation from multiple correspondences as you mentioned (n=2 or n=4). Our method predicts fully 6D pose from a single correspondence (n=1) which is different and outperforms other approaches.

I closed the issue but feel free to re-open it again if you have additional questions!