zero-shot model

Question

zero-shot model

LindgeW opened this issue 3 years ago · 2 comments

Hi,

I would like to use the WiSE-FT method to other tasks or pretrained models (e.g., bert, gpt). In this context, the so-called zero-shot model is actually the orignial model without fine-tuning, right? and the zero-model parameters actually means the directly-loaded pretrained parameters?

Thank you!

Answer 1 · 2022-09-22T16:17:08.000Z

Yes, that's right! A couple of notes: if you-re fine-tuning encoder-only models like BERT, it is common create a new classification head before fine-tuning, which doesn't exist on the pre-trained model. If you do that, you can interpolate only the backbone parameters, and always use the trained classification head, as in this paper. In this case, you might want to try fine-tuning only the new layer (freezing the backbone) for a few steps first, then fine-tune end-to-end. Alternatively, you could also fine-tune without introducing new parameters, as in this paper (see Appendix J.5).

Answer 2 · 2022-09-23T01:08:29.000Z

good, thank you so much!