uclanlp/visualbert

Pre-training on other BERT models

Muennighoff opened this issue · 2 comments

Thanks for the great repo and your efforts! Two quick questions:

Is there anything that speaks against pre-training VisualBERT with Albert instead of BERT on COCO and then finetune it for downstream tasks?
Also, I havn't found exact details on what resources are needed for pre-training, except for that it took less than a day on COCO according to your paper - How much hours did it take & what GPUs did you use?

Okay I see - If I would implement it with Albert, I would have to retrain on COCO right? I cannot just fine-tune using Albert, as the architecture trained on COCO was bert uncased.