Are CLIP/TIP-Adapter only designed for the few-shot setting?

Question

Are CLIP/TIP-Adapter only designed for the few-shot setting?

Closed this issue 2 years ago · 4 comments

machengcheng2016 commented 2 years ago

Sorry I've got another question.
I did not find experiments under the base-to-new/domain generalization setting and cross-dataset transfer setting, which is conducted by CoCoOp.
Are CLIP/TIP-Adapter only designed for the few-shot setting? I wonder how the generation abilities are. Maybe you can give me any intuition?

Answer 1 · 2022-07-19T19:25:59.000Z

It's Okay!
We have conducted base-to-new domain transfer experiments for Tip-Adapter and CLIP-Adapter in the revised paper, which will be on arxiv in a few days.

Answer 2 · 2022-07-20T07:51:01.000Z

Sure thing, big brother.
To be honest, I am still confused that why the image adapter can help zero/few-shot classification. What is the motivation of CLIP/TIP adapter, please?

Answer 3 · 2022-07-20T08:04:17.000Z

The motivation is partly from the adapters in NLP. Theirs are inserted into every encoder block of the pre-trained transformer, but ours only appends after the encoder.
Also, as the pre-trained domain has semantic gaps with downstream domains, it is intuitive that an additional learnable module could help the pre-trained networks perform better.

Answer 4 · 2022-07-20T08:05:48.000Z

See. Appreciated~