kamalkraj/ALBERT-TF2.0

Pretraining from scatch

hairzooc opened this issue · 5 comments

Hi,
Thanks for your code :) It's very helpful for me to study ALBERT.
As long as I know ALBERT batch size is 4096 on the paper.
Have you ever tried to pretrain from scratch via GPU?
I've seen your guide for squad fine tuning but couldn't find any information about pretraining from scratch.
Please let me know if you have any info on that.

Thanks for your reply. :)
Taking 1 week is not a problem for me and I have 8 x TITAN RTX 24GB for now.

@hairzooc
hi, were you able to train your model? how much time it took and how was its performance?