Some question about the result

Question

Some question about the result

Zora137 opened this issue 5 months ago · 4 comments

hello,sorry to bother you many times
this is the result I run your code by this command without pretain

python train.py --data_path path/to/your/data --model_name mytrain --num_epochs 30 --batch_size 12 --lr 0.0001 5e-6 31 0.0001 1e-5 31

It cannot reach the resulin your papert without pretain:

is this normal？ or did dosomething wrong?
expect your answer，thank you so much

Answer 1 · 2024-04-10T17:53:44.000Z

Hi, the results are too bad. As stated in our paper:

For models trained from scratch an initial learning rate of 5e−4 with a cosine learning rate schedule [26] is adopted, and the training epoch is set to 35.

Could you please try using a larger initial learning rate?

Answer 2 · 2024-04-18T06:07:33.000Z

Hi, the results are too bad. As stated in our paper:

For models trained from scratch an initial learning rate of 5e−4 with a cosine learning rate schedule [26] is adopted, and the training epoch is set to 35.

Could you please try using a larger initial learning rate?

hi nice work !!!
can you show me your args file ,I do many times the ruseltalways this

trained by lite-mono

Answer 3 · 2024-04-18T07:13:33.000Z

Hi, the results are too bad. As stated in our paper:

For models trained from scratch an initial learning rate of 5e−4 with a cosine learning rate schedule [26] is adopted, and the training epoch is set to 35.

Could you please try using a larger initial learning rate?

hi nice work !!! can you show me your args file ,I do many times the ruseltalways this

trained by lite-mono

Hi, you can try setting the learning rate to --lr 0.0001 5e-6 16 0.0001 1e-5 16. drop_path can be set to 0.3. But this might cause your training not converging. Please make sure you are using the same dependencies as we used. #58

Also, please check the results of each epoch, not only the last epoch. The best result should be achieved at an earlier epoch.

Answer 4 · 2024-04-18T07:19:01.000Z

嗨，结果太糟糕了。正如我们的论文所述：

对于从头开始训练的模型，采用余弦学习率计划[26]的初始学习率为5e−4，训练周期设置为35。

你能尝试使用更大的初始学习率吗？

嗨，干得好!!你能给我看看你的args文件吗，我做了很多次ruselt总是这个
由 Lite-Mono 训练

嗨，您可以尝试将学习率设置为。可以设置为。但这可能会导致您的训练无法收敛。请确保您使用的依赖项与我们使用的依赖项相同。#58--lr 0.0001 5e-6 16 0.0001 1e-5 16``drop_path``0.3

另外，请检查每个纪元的结果，而不仅仅是最后一个纪元。最好的结果应该在更早的时代实现。
ok thanks!!! i will try it