kugwzk/DiDE

help

Closed this issue · 1 comments

作者大大,你预训练的时候会看双塔的itm精度么,多久才开始上升啊.我在coco上训练10epoch还是在50震荡.想问一下你们最后的Shallow Fusion 用的什么方法,我用的mlp. 这个结果正常吗,是我训练轮数少了么

Sorry for the delayed response.
We use the MLP the same as ViLT. And we mainly follow the original pre-training setting: coco+vg+gcc+sbu/ 200k steps with a batch size of 1024.
May you pre-train on more data or more steps? Or try using a fusion teacher in pretraining to guide the training.