Support image prompt

Question

Support image prompt

wentao-uw opened this issue 5 months ago · 2 comments

Use case: replace logo or text in the video. Input: old logo, video (with old logo), new logo; output: video with new logo

Answer 1 · 2024-08-23T02:27:02.000Z

Hi @wentao-uw , it's a good idea to support referring detection or segmentation based on image prompt, but Grounding DINO can only support text prompts now, for referring detection or detection based on visual prompt you can try to combine SAM 2 with our T-Rex2 model.

And you can support this pipeline with video-editing model for additional editing on videos

Answer 2 · 2024-08-26T05:47:50.000Z

Hi @wentao-uw , for image prompt detection and segmentation, you can also try DINOv for this function. It can track or detect anything by visual prompt.