arxyzan/data2vec-pytorch

Which Hyperparameter tuning method to use?

anirudh2019 opened this issue · 2 comments

First of all, thank you for the great work @AryanShekarlaban and @kabouzeid!

quick question: I have less experience with training big models like Transformers. I see that there are many frameworks and Algorithms for Hyperparameter tuning in internet. Could you suggest a hyperparameter tuning framework and algorithm for data2vec?

Thank you!

Hello @anirudh2019, glad to see this repo has been helpful to you.
Generally, hyperparameter tuning is a trial-error method and making it actually work requires a lot of effort. It gets even more difficult in training large models like Transformers and it's really not recommended to try to tune hyperparams for pretraining unless you have access to really powerful resources. The recommended way is to tune params for downstream tasks and distillation. In that case you can refer to the paper of the encoder you use (RoBERTa, BEiT, Wav2Vec2, etc) to see their best practices regarding hyperparameter tuning.

Best,
Aryan

Thank you for your suggestion!