How does the pre-train process effect the final performance?

Question

How does the pre-train process effect the final performance?

DYNreB51Cx opened this issue 3 years ago · 1 comments

Thanks for your wonderful work!
I noticed that in your paper, before train IFC on VIS dataset, you firstly add an extra pretrain process on COCO dataset by setting T to 1. This implies the memory token and all bus layers are also pretrained during this process.
So I'm wondering how this process influence the final performance on VIS? If we do not pretrain all memory token and bus layers on COCO, what will happen to the final performance on YouTube dataset?
Hoping for your reply and thank you again.

Answer 1 · 2021-11-10T11:54:50.000Z

Hi @DYNreB51Cx ,

From experiments, we found that even if the memory tokens and the bus layers do not get pretrained on COCO and be directly start from random initial weights, they show fast convergence and often obtain comparable final results.
What matters is the instability of training due to the small training set; the random start shows more fluctuating results and sometimes leads to a drop to near 36AP.

Thank you :)