gaopengcuhk/Tip-Adapter

Are CLIP/TIP-Adapter only designed for the few-shot setting?

Closed this issue · 4 comments

Sorry I've got another question.
I did not find experiments under the base-to-new/domain generalization setting and cross-dataset transfer setting, which is conducted by CoCoOp.
Are CLIP/TIP-Adapter only designed for the few-shot setting? I wonder how the generation abilities are. Maybe you can give me any intuition?

It's Okay!
We have conducted base-to-new domain transfer experiments for Tip-Adapter and CLIP-Adapter in the revised paper, which will be on arxiv in a few days.

Sure thing, big brother.
To be honest, I am still confused that why the image adapter can help zero/few-shot classification. What is the motivation of CLIP/TIP adapter, please?

The motivation is partly from the adapters in NLP. Theirs are inserted into every encoder block of the pre-trained transformer, but ours only appends after the encoder.
Also, as the pre-trained domain has semantic gaps with downstream domains, it is intuitive that an additional learnable module could help the pre-trained networks perform better.

See. Appreciated~