Support image prompt
wentao-uw opened this issue · 2 comments
wentao-uw commented
Use case: replace logo or text in the video. Input: old logo, video (with old logo), new logo; output: video with new logo
rentainhe commented
Hi @wentao-uw , it's a good idea to support referring detection or segmentation based on image prompt, but Grounding DINO
can only support text prompts now, for referring detection or detection based on visual prompt you can try to combine SAM 2
with our T-Rex2 model.
And you can support this pipeline with video-editing model for additional editing on videos
rentainhe commented
Hi @wentao-uw , for image prompt detection and segmentation, you can also try DINOv for this function. It can track or detect anything by visual prompt.