Pretraining from scatch

Question

Pretraining from scatch

hairzooc opened this issue 5 years ago · 5 comments

Hi,
Thanks for your code :) It's very helpful for me to study ALBERT.
As long as I know ALBERT batch size is 4096 on the paper.
Have you ever tried to pretrain from scratch via GPU?
I've seen your guide for squad fine tuning but couldn't find any information about pretraining from scratch.
Please let me know if you have any info on that.

Answer 1 · 2019-11-12T02:49:40.000Z

Hi, I am not the kamalkraj, but I think train from scratch is difficult because data is too large to train , if you have just one GPU or other,it still need at least one week to train this things.

…

On Tue, Nov 12, 2019 at 10:44 hairzooc ***@***.***> wrote: Hi, Thanks for your code :) It's very helpful for me to study ALBERT. As long as I know ALBERT batch size is 4096 on the paper. Have you ever tried to pretrain from scratch via GPU? I've seen your guide for squad fine tuning but couldn't find any information about pretraining from scratch. Please let me know if you have any info on that. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#7?email_source=notifications&email_token=AIEAE4GRKASNGUXZDS3YNHTQTIKAHA5CNFSM4JL5NT22YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HYSKS7A>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIEAE4DRWCY7J3ZIGU5EFVDQTIKAHANCNFSM4JL5NT2Q> .

Answer 2 · 2019-11-12T03:52:58.000Z

Thanks for your reply. :)
Taking 1 week is not a problem for me and I have 8 x TITAN RTX 24GB for now.

Answer 3 · 2019-11-12T03:54:21.000Z

Wow!! Great

…

On Tue, Nov 12, 2019 at 11:52 hairzooc ***@***.***> wrote: Thanks for your reply. :) Taking 1 week is not a problem for me and I have 8 x TITAN RTX 24GB for now. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#7?email_source=notifications&email_token=AIEAE4G6YDU6KKLSLR5SMTTQTISBVA5CNFSM4JL5NT22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDY6COQ#issuecomment-552722746>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIEAE4B5Y5KFYMNP67RHJDTQTISBVANCNFSM4JL5NT2Q> .

Answer 4 · 2019-11-12T06:30:11.000Z

@hairzooc
https://github.com/kamalkraj/ALBERT-TF2.0/blob/master/pretraining.md

Answer 5 · 2019-11-29T14:42:10.000Z

@hairzooc
hi, were you able to train your model? how much time it took and how was its performance?