Longer training schedule?
JUGGHM opened this issue · 2 comments
Hi Assran, thank you for your greatTTTTTTTTT work! I wonder if using a longer pre-training schedule (800/1200/1600 epochs), how much performance superiority can we get upon previous methods like MAE?
Hi Assran, thank you for your greatTTTTTTTTT work! I wonder if using a longer pre-training schedule (800/1200/1600 epochs), how much performance superiority can we get upon previous methods like MAE?
Especially for full fine-tuning settings
Hi @JUGGHM, with a small image resolution (e.g., 224x224) we didn't see a huge advantage for longer training on IN1k to justify the additional compute, but I think longer training with a bigger resolution (e.g., 448x448) or with bigger datasets should lead to non-trivial performance improvements.
I'll close this issue for now, but let me know if you get the chance to try this out!