aim-uofa/Matcher

Why can't use SAM encoder to get extracted feature?

ruizhaoz opened this issue · 2 comments

Have you try directly use SAM encoder to extract feature instead use other pretrained model?

The features extracted using SAM achieve only around 20 mIoU on fold 0 of COCO-20i. The SAM encoder with weak semantics performs poorly in complex scenes. Here are two reasons for this:

  1. Poor feature matching: SAM's features fail to match multiple instances with similar semantics in complex scenes.
  2. Poor semantic guidance: SAM cannot provide effective semantic guidance for ILM (Instance-Level Matching) to select high-quality mask proposals.

Dinov2 has great ability in instance retrieval / dense matching. The backbone of SAM is pretrained via MAE, whose feature is not that discriminative.