Pre-training on other BERT models

Question

Pre-training on other BERT models

Muennighoff opened this issue 5 years ago · 2 comments

Thanks for the great repo and your efforts! Two quick questions:

Is there anything that speaks against pre-training VisualBERT with Albert instead of BERT on COCO and then finetune it for downstream tasks?
Also, I havn't found exact details on what resources are needed for pre-training, except for that it took less than a day on COCO according to your paper - How much hours did it take & what GPUs did you use?

Answer 1 · 2020-07-25T08:59:51.000Z

Hi, 1. I would imagine that using Albert would still work. 2. For most experiments, I do it on 4 1080Ti with 12G memory. Pre-training on COCO takes less than a day, maybe 18-20 hours? Sorry that I cannot recall the exact amount of time needed. For experiments on VCR, I used 4 V100s with 16G memory.

…

On Thu, 23 Jul 2020 at 12:26, Muennighoff ***@***.***> wrote: Thanks for the great repo and your efforts! Two quick questions: Is there anything that speaks against pre-training VisualBERT with Albert instead of BERT on COCO and then finetune it for downstream tasks? Also, I havn't found exact details on what resources are needed for pre-training, except for that it took less than a day on COCO according to your paper - How much hours did it take & what GPUs did you use? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#12>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALAYVCVE7CHMDIKSWJREF4LR5CFEZANCNFSM4PGBHAWQ> .

Answer 2 · 2020-07-25T09:13:30.000Z

Okay I see - If I would implement it with Albert, I would have to retrain on COCO right? I cannot just fine-tune using Albert, as the architecture trained on COCO was bert uncased.