About the pretrained model and training time
byminji opened this issue · 6 comments
Hi, thanks for releasing a code for your great work!
Could you please share the pretrained PolyFormer weights? Also, how long did it take to pretrain and finetune the model in your environment?
Thanks :)
Hi @byminji,
Thanks for your interest in our work! We just released the model weights (both pre-trained and fine-tuned).
For pretrianing, with 8 A-100 GPUs, the pretraining of PolyFormerL takes a week, and fine-tuning takes 2-3 days.
@huidingai Thanks for sharing the model weights! I have one more question about the fine-tuning process. Based on the run script for fine-tuning, it seems that you use the combined training set for refcoco/+/g, but there are different model weights for each dataset. What's the difference between each model? (e.g., polyformer_b_refcoco.pt
vs polyformer_b_refcoco+.pt
)
@huidingai I see. Thank you for your answers :)
Hello,I would like to express my sincere appreciation for your hard work and contributions to the project. It has been incredibly valuable and informative.
If I don't have the same number of GPUs, does that mean I can't directly use the pre-trained models you provided? It seems that this would result in a mismatch between the loaded pre-trained model and the model I want to use.
Hello,I would like to express my sincere appreciation for your hard work and contributions to the project. It has been incredibly valuable and informative. If I don't have the same number of GPUs, does that mean I can't directly use the pre-trained models you provided? It seems that this would result in a mismatch between the loaded pre-trained model and the model I want to use.
Hi, you should still be able to use the pretrained models but you will need to modify the training/ evaluation scripts to match the number of gpus you have.