naver/sqlova

Replicating dev results on BERT base

Closed this issue · 0 comments

Hello! Hope you're keeping safe : )

I had a question on replicating the results on dev using BERT base, I ran the following command :

python3 train.py --seed 1 --bS 16 --accumulate_gradients 2 --bert_type_abb uS --fine_tune --lr 0.001 --lr_bert 0.00001 --max_seq_leng 222

I changed the import from pytorch_pretrained_bert to transformers and made changes to load BERT as -

 bert_config = BertConfig.from_pretrained("bert-base-uncased", output_hidden_states=True)
 model_bert = BertModel.from_pretrained("bert-base-uncased", config=bert_config)
 tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

The model runs okay, but gives a dev accuracy of 72.4 after 16 epochs. Would you be able to help me in understanding where I might be going wrong while replicating the results as the readme mentions that the model would reach about 79 within 12 epochs.

Please let me know in case any further detail is needed from my end