Use Grounding DINO, Segment Anything, and CLIP to label objects in images.
Below is an image with segmentation masks of all McDonalds
logos in an image.
This demo was created by sending the prompt logo
to Grounding DINO and SAM, then classifying each prediction using CLIP with two prompts: McDonalds
and Burger King
.
This project is licensed under an MIT license.