IDEA-Research/Grounded-SAM-2

Support image prompt

wentao-uw opened this issue · 2 comments

Use case: replace logo or text in the video. Input: old logo, video (with old logo), new logo; output: video with new logo

Hi @wentao-uw , it's a good idea to support referring detection or segmentation based on image prompt, but Grounding DINO can only support text prompts now, for referring detection or detection based on visual prompt you can try to combine SAM 2 with our T-Rex2 model.

And you can support this pipeline with video-editing model for additional editing on videos

Hi @wentao-uw , for image prompt detection and segmentation, you can also try DINOv for this function. It can track or detect anything by visual prompt.