Why can't use SAM encoder to get extracted feature?

Question

Why can't use SAM encoder to get extracted feature?

ruizhaoz opened this issue a year ago · 2 comments

Have you try directly use SAM encoder to extract feature instead use other pretrained model?

Answer 1 · 2023-08-11T07:30:54.000Z

The features extracted using SAM achieve only around 20 mIoU on fold 0 of COCO-20i. The SAM encoder with weak semantics performs poorly in complex scenes. Here are two reasons for this:

Poor feature matching: SAM's features fail to match multiple instances with similar semantics in complex scenes.
Poor semantic guidance: SAM cannot provide effective semantic guidance for ILM (Instance-Level Matching) to select high-quality mask proposals.

Answer 2 · 2024-02-20T10:12:22.000Z

Dinov2 has great ability in instance retrieval / dense matching. The backbone of SAM is pretrained via MAE, whose feature is not that discriminative.