training time for similar performance with pre-trained checkpoint

Question

training time for similar performance with pre-trained checkpoint

hwopark opened this issue 6 months ago · 1 comments

In the paper, the model was optimized using Adam on 8 NVIDIA 3090 GPUs with a batch size of 16.
How many epochs are needed to achieve performance comparable to the released pre-trained checkpoint? and could you share how much time it will take?

Answer 1 · 2024-07-12T17:08:24.000Z

Hi, @hwopark , We trained for about 15 epochs on all the data, and it took about 15 days to achieve the desired results.