Hyperparameters mismatch
e-bug opened this issue · 1 comments
e-bug commented
Hi @jackroos and thanks for the great repo!
-
I was looking at the cfgs file for VQA and noticed different hyperparameters than in the appendix of the paper.
For instance, 5 epochs instead of 20, 500 warmup steps instead of 2000, smaller learning rate, ...
Should we follow -- in this and other tasks -- the values in the repository or the ones in the paper? -
Also, are inputs not truncated to a maximum length during fine-tuning?
Thanks!
jackroos commented
- You can fine-tune it with 20 epochs, but we found 5 epoch is enough for pre-trained VL-BERT. 20 epochs setting is for comparison with model without pre-training. And for the learning rate, it is consistent with paper, you need to multiply the batch size since the LR in config yaml is normalized by batch size.
- Since the length of VQA are usually not very long, we don't conduct truncating.
Thank you!