Question about pre-training process
natedingyifeng opened this issue · 1 comments
natedingyifeng commented
Hi, I am curious about what validation task you are using during the pre-training process. Could you please share some information about this issue?
yuewang-cuhk commented
Hi, for each stage (either MSP or NTP task) of pretraining, we employ a small proprotion of training data as the held-out validation set and monitor the corresponding loss (either MSP or NTP loss) on this subset. We stop the pretraining when it converges (or in other words, the validation loss does not decrease).