jackroos/VL-BERT

type_vocab_size of pretrained model should be 2

whqwill opened this issue · 1 comments

type_vocab_size of pretrained model should be 2, right? But it shows 3. For my understanding, there are only two types in the pretraining: one for texts and one for images. So do I miss something?

Since in some task, the text contains two sentences, for example, question and answer in VQA and VCR, so we use different segment embedding following BERT.