google-research/electra

About the Electra paper

lgdgodv opened this issue · 0 comments

On Page 13 of the paper, fine-tuning details part, the paper mentions that
"we searched for the best number of train epochs out of [10, 3] for each task. For SQuAD,
we decreased the number of train epochs to 2 to be consistent with BERT and RoBERTa"

My question is that did Electra use similar early stopping technique (set the training once for 10 epochs, early stop when validation loss stops decreasing) like in RoBERTa? Or you guys set the training for 3,4,5,6,7,8,9,10 epochs separately?

Thanks.