octimot/StoryToolkitAI

Additional Zero Shot Models

Opened this issue · 1 comments

Is your feature request related to a problem? Please describe.
No.

Describe the solution you'd like
Additional Zero Shot models; such as Grounding DINO. Maybe Detectron2 or Segment Anything. However, Grounding DINO - which is promptable - would be great.

Describe alternatives you've considered
n/a

Additional context
The Grounding DINO model is promptable and apparently scores higher than CLIP.

Hey there!

I think Segment Anything / Grounding DINO are creating more restrictive embeddings due to their promptable nature (more focused training data). In other words, CLIP on its own allows you to search using more "obscure" language, while others might be restricted to more common words (car, sky, bird, face etc.)

We're preparing an update which also allows the use of GPT-Vision and LLaVA-like models that would allow you to ingest and prompt directly too.

But, I'll take a look at these too ASAP!

Cheers