prdwb/bert_hae

Can't reproduce paper results

LarryLee-BD opened this issue · 2 comments

Hi, I ran the script and got evaluation results like this:

epoch finished!
evaluation: 24000, total_loss: 1.491316556930542, f1: 59.33490221751538, followup: 0.0, yesno: 19.3518941122775, heq: 55.84968811805872, dheq: 4.5

Model saved in path OUTPUT_DIR/model_24000.ckpt

The paper shows F1=63.1/62.4

prdwb commented

Hi Larry, could you provide your detailed hyper-parameter settings?

Could you try to follow the settings in Section 4.2.3 of our paper https://arxiv.org/pdf/1905.05412.pdf ? I would also encourage you to set the max sequence length to 512 and max answer length to 50 for better performance. Thanks.

Section 4.2.3 :
Models are implemented with
TensorFlow. We use the BERT-Base (Uncased) model with the
max sequence length set to 384. The batch size is set to 12. The
number of history turns to incorporate is tuned as presented in
Section 4.4. We train the ConvQA model with an Adam weight
decay optimizer with an initial learning rate of 3e-5. We set the
stride in the sliding window for passages to 128, the max question
length to 64, and the max answer length to 30. We save checkpoints
every 1,000 steps and test on the validation set. We use QuAC v0.2.

I used default hyper-parameter:

python hae.py
--output_dir=OUTPUT_DIR
--history=6
--num_train_epochs=3.0
--train_steps=24000
--max_considered_history_turns=11
--learning_rate=3e-05
--warmup_proportion=0.1
--evaluation_steps=1000
--evaluate_after=18000
--load_small_portion=False

the script cqa_flags.py basely is same with Section 4.2.3 except batch_size 6.
OK, thanks for the suggestions