Training the BERT large extractive model
Shashi456 opened this issue · 0 comments
Shashi456 commented
Hello,
Are the batch sizes and accum count for the bert large exactly the same as the base model? I have been trying to get the results but my bert large has been strictly performing worse than the base model( about 3-4 rouge points) and I have no idea why