Training the BERT large extractive model

Question

Training the BERT large extractive model

Shashi456 opened this issue 2 years ago · 0 comments

Hello,

Are the batch sizes and accum count for the bert large exactly the same as the base model? I have been trying to get the results but my bert large has been strictly performing worse than the base model( about 3-4 rouge points) and I have no idea why