arxyzan/data2vec-pytorch

Any plan for supporting distributed training?

ruizewang opened this issue · 3 comments

Hi arxyzan~
Thanks for your great contribution! I wonder to known is there any plan for supporting distributed training, as pre-training will be too slow with a single GPU?

Hi Ray
I strongly recommend using the official repo if you want stable results for pretraining. But in case you want to use other encoder backends (which might be a little complicated to do in fairseq) just let me know and I will put it in my to-do list.

Thanks for your quick response. Your implementation is neat and easy-to-follow, yet the official one provided by fairseq is somehow complicated.
Anyway, thanks for your contribution again.

Glad to hear this as it's been my primary goal. I'm definitely gonna provide the code for distributed training and mixed precision. Stay tuned!