What's is the proper loss during training?

Question

Closed this issue 4 years ago · 3 comments

For me, the training loss is about 11.5 in the beginning, and 9.5 in the ending, is it reasonable?

Answer 1 · 2021-01-11T06:41:34.000Z

In our experiments, the loss value does not mean a lot. You have to finetune the checkpoint to check the effect of pretraining.

Answer 2 · 2021-01-12T10:52:02.000Z

In the case of using 700k samples, how many epoches of training are needed to achieve the best performance?

Answer 3 · 2021-01-18T12:35:10.000Z

I used 150 epochs when the batch size is 256. wangxinliang <notifications@github.com> 于2021年1月12日周二下午6:52写道：

In the case of using 700k samples, how many epoches of training are needed to achieve the best performance? — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADGC7DGVWZUW2H7ONE32KULSZQSWFANCNFSM4VNFZZ5A> .