Larger models learn slowly?
chenwydj opened this issue · 2 comments
chenwydj commented
Dear authors,
Thank you very much for this great repo!
I am training larger models (T2T-ViT-19/24, etc.), and I find during training their accuracies increase slower than small models like T2T-ViT-7. Is this an expected behavior?
Thank you!
yuanli2333 commented
Hi, large model would converge slower at first 10 to 20 epochs, but will increase faster after the initial stage.
chenwydj commented
Thank you!