Why no task-specific fine-tuning // any plans?
Opened this issue · 1 comments
If I understand correctly, the weights were used directly from BERT, with the only free parameters the LSTM+MLP layers:
"For simplicity, experiments are performed without
any hyperparameter tuning and with fixed BERT weights"
-
Why this choice? I see the footnote that this is 2.5x slower, but that doesn't seem that prohibitive (is all relative, of course!), given that the original BERT paper (https://arxiv.org/pdf/1810.04805.pdf, section 5.4) observed a 1.5 point bump from fine-tuning on the downstream tasks.
-
Any plans to issue a second iteration of the paper with fine-tuning? Seems like there might be a lot of upside left on the floor here.
Not a criticism, just 1) trying to understand how far SciBERT pushes things, in-domain, and 2) think this is neat and would love to see you push things all the way. :)
Thanks for the interest. We're currently in the process of doing the fine-tuning experiments :) Look forward to the updated results when they finish