Why can't use SAM encoder to get extracted feature?
ruizhaoz opened this issue · 2 comments
ruizhaoz commented
Have you try directly use SAM encoder to extract feature instead use other pretrained model?
yangliu96 commented
The features extracted using SAM achieve only around 20 mIoU on fold 0 of COCO-20i. The SAM encoder with weak semantics performs poorly in complex scenes. Here are two reasons for this:
- Poor feature matching: SAM's features fail to match multiple instances with similar semantics in complex scenes.
- Poor semantic guidance: SAM cannot provide effective semantic guidance for ILM (Instance-Level Matching) to select high-quality mask proposals.
fjchange commented
Dinov2 has great ability in instance retrieval / dense matching. The backbone of SAM is pretrained via MAE, whose feature is not that discriminative.