the inference code of tracking like Figure 3. (b) in paper

Question

the inference code of tracking like Figure 3. (b) in paper

Closed this issue 5 months ago · 3 comments

Hi Siyuan,
Thanks for your great work and released codes!
I'm wondering how to track by combining the detection head (of MASA Adapter) and SAM directly, just like the Figure 3. (b) in paper. Will it be released later?

Answer 1 · 2024-06-19T23:02:22.000Z

hi thanks, SAM's prediction is not consistent across video frames hence leading to heavy flickering due to missing detections. Therefore, we have not provided such a demo yet. We will try to reduce the flickering effect first. However, it is straightforward and simple to test it for yourself first. You can replace the detection bounding boxes with the output of masa trained detection head and run.

Answer 2 · 2024-06-20T00:47:26.000Z

Thanks for your reply, looking forward to your new achievements.

Answer 3 · 2024-07-24T23:16:17.000Z

Hi Siyuan,

Congrats on this amazing work! It is very good to use for tracking!

I am interested in associating SAM masks between frames and still have questions about it.

Regarding the suggestion you gave above -- "replace the detection bounding boxes with the output of masa-trained detection head and run", could you give some quick guidance on how to use the masa-trained detection head?
Currently, the demos use grounding-DINO/Yolox/Co-DETR to find objects. What changes to make so as to call the masa-trained detector?