BitFit for GPT-2 Models
Closed this issue · 1 comments
siddk commented
Are there results on what the best finetuning scheme for GPT (autoregressive) style models are? I couldn't find it in the Delta Tuning paper... does BitFit finetuning perform well for GPT-2 -- are there any public benchmarks that show performance?
ningding97 commented
The conclusion may be similar to enc-dec models but we haven't systematically tested it yet (we know the prefix-style method works well on autoregressive models). Thanks for reminding us and we will comprehensively test it soon.