About the BERT finetune setting
bcmi220 opened this issue · 4 comments
Hello,
When I was using this repo, I found that the "is_traing=True" in BertVocab still doesn't make the BERT fine-tune due to only the GraphParserNetwork's params is optimized in training. And for current, the "is_traing=True" performance is lower than the "is_traing=False". Could you take a look at it? Thank you!
Yes, I found that finetuning BERT did not improve the performance. Currently I have no time to check what has happened with this issue, but I think one possible reason for this is that current learning rate is not suitable for the BERT finetuning. In other work, the learning rate of finetuning BERT should be at a scale of 1e-5, while I set the learning rate to be 0.01 in the config file. However, the learning rate for other structure in the model like LSTM and Biaffine should be larger. Therefore, you may try a different learning rate for BERT (1e-5) and other parts (0.01).
Thank you, I will try this setting, if have further results, I will update here.
Fine-tuning BERT on the parsing models possibly cannot further improve the performance. We did some experiments on fine-tuning bert but find inferior performance than the models without finetuning. For the results of fine-tuning BERT, you may refer to this paper. For the reference of the BERT model without finetuning on SDP datasets, you may compare with Fernández-González & Gómez-Rodríguez, 2020. Though these two papers use different approaches to parsing and Fernández-González & Gómez-Rodríguez, 2020 applied Word, Lemma and Char embeddings additionally. Currently, I believe fine-tuning cannot improve the performance of parsing as significant as sequence labeling tasks such as NER.
Good points, I will close this issue. Thank you!