thunlp/OpenDelta

BitFit for GPT-2 Models

Closed this issue · 1 comments

siddk commented

Are there results on what the best finetuning scheme for GPT (autoregressive) style models are? I couldn't find it in the Delta Tuning paper... does BitFit finetuning perform well for GPT-2 -- are there any public benchmarks that show performance?

The conclusion may be similar to enc-dec models but we haven't systematically tested it yet (we know the prefix-style method works well on autoregressive models). Thanks for reminding us and we will comprehensively test it soon.