doc-doc/HQGA

Doubts about extracting BERT features

Closed this issue · 2 comments

Hi, @doc-doc , thank you for sharing such great work!

Following the NExT-QA, I finetune the pytorch-pretrained-BERT on NExT-QA with :

--max_seq_length 37
--train_batch_size 64
--learning_rate 5e-5
--num_train_epochs 3
--warmup_proportion 0.1
--gradient_accumulation_steps 1 
--loss_scale 0

Finally, I chose the 2-th epoch model (train acc 79%, val acc 45%, test acc 47%), and test the features on NExT-QA.
Unfortunately, I failed to get as good results as HGQA.
However, if I use the BERT features provided by Next-QA, I can get consistent results with HGQA.

I'm wondering if there are any details I'm missing here ?

Hi, we directly use the extracted BERT-feature provided by NExT-QA repo for experiments on NExT-QA. Your BERT fintuning seems correct. You can try different models, e.g., # finetune 1 epoch to prevent over-fitting? or more epochs?

Thank you for your reply. I have achieved the desired result.