Question about reproducing.

Question

Question about reproducing.

Closed this issue 2 years ago · 2 comments

Thanks for your great work.

Can you tell me how long the model needs to be trained under the configuration of 4 NVIDIA GeForce RTX 3090GPUs to converge to the results in the paper? If it is convenient, could you provide your training logs?

I'm in the process of reproducing it now, but I found that the loss became jittery after a period of training, I don't know if I configured it wrong or if it's inherently so, slowly converging to the result in a long period of jitters. So I hope the author will provide a training log, if possible（thanks a lot).

Thank you very much!

Answer 1 · 2022-03-28T02:23:57.000Z

Thank your for your interest.

We trained the model for around a week using 4 NVIDIA GeForce RTX 3090 GPUs for 10 epochs.
We attached the plot of test accuracy along iterations.
The performance increases significantly at around 250,000 step in which learning rate is decayed. Therefore, the learning rate decay technique improves the performance.

Thank you.

Answer 2 · 2022-03-28T02:47:03.000Z

Thank you for your reply, it will be very helpful for my reproducing.