zero-shot model
LindgeW opened this issue · 2 comments
Hi,
I would like to use the WiSE-FT method to other tasks or pretrained models (e.g., bert, gpt). In this context, the so-called zero-shot model
is actually the orignial model without fine-tuning, right? and the zero-model parameters actually means the directly-loaded pretrained parameters?
Thank you!
Yes, that's right! A couple of notes: if you-re fine-tuning encoder-only models like BERT, it is common create a new classification head before fine-tuning, which doesn't exist on the pre-trained model. If you do that, you can interpolate only the backbone parameters, and always use the trained classification head, as in this paper. In this case, you might want to try fine-tuning only the new layer (freezing the backbone) for a few steps first, then fine-tune end-to-end. Alternatively, you could also fine-tune without introducing new parameters, as in this paper (see Appendix J.5).
good, thank you so much!