Pre-training on other BERT models
Muennighoff opened this issue · 2 comments
Muennighoff commented
Thanks for the great repo and your efforts! Two quick questions:
Is there anything that speaks against pre-training VisualBERT with Albert instead of BERT on COCO and then finetune it for downstream tasks?
Also, I havn't found exact details on what resources are needed for pre-training, except for that it took less than a day on COCO according to your paper - How much hours did it take & what GPUs did you use?
liunian-harold-li commented
Hi,
1. I would imagine that using Albert would still work.
2. For most experiments, I do it on 4 1080Ti with 12G memory. Pre-training
on COCO takes less than a day, maybe 18-20 hours? Sorry that I cannot
recall the exact amount of time needed. For experiments on VCR, I used 4
V100s with 16G memory.
…On Thu, 23 Jul 2020 at 12:26, Muennighoff ***@***.***> wrote:
Thanks for the great repo and your efforts! Two quick questions:
Is there anything that speaks against pre-training VisualBERT with Albert
instead of BERT on COCO and then finetune it for downstream tasks?
Also, I havn't found exact details on what resources are needed for
pre-training, except for that it took less than a day on COCO according to
your paper - How much hours did it take & what GPUs did you use?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#12>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALAYVCVE7CHMDIKSWJREF4LR5CFEZANCNFSM4PGBHAWQ>
.
Muennighoff commented
Okay I see - If I would implement it with Albert, I would have to retrain on COCO right? I cannot just fine-tune using Albert, as the architecture trained on COCO was bert uncased.