Training reproduce

Question

Training reproduce

ChaoGaoUCR opened this issue a year ago · 2 comments

Dear Authors,

Thanks for the great work again.
I have a quick question,
I try to do training with 4 epochs by setting the trainer epoch to 1 and using for to repeat it four times.
I can't get the same result with this,
Any hint for what I did wrong?

Thanks

Answer 1 · 2023-09-22T06:21:40.000Z

Hi,

Did you resume from previous epoch checkpoints? If so, please ensure every epoch is training from scratch. If not, you can set a random seed to strengthen the result reproducibility. Could you report the results you got? I'd like to know how the results vary. Thanks!

Answer 2 · 2023-09-22T06:51:30.000Z

Dear,

Thanks, it's my fault, I set the batch size too big(512),
I fixed it with batch size 16, now it works perfectly.

Best