How to give mmgrounding dino few-shot image examples like owl-vit?

Question

How to give mmgrounding dino few-shot image examples like owl-vit?

zappy586 opened this issue 22 days ago · 0 comments

Mmgrounding seems really promising for few-shot object detection. But the early modality fusion makes the architecture very confusing. Has anyone tried to convert this model into a few-shot learner or has any ideas on how to do it?