bytedance/lightseq

About Vit encoder output consistency during inference?

xiao2mo opened this issue · 4 comments

Hi, does the vit implementation has been fully tested in terms of consistency?
I've found that the encoder output is totally different.

We have tested the consistency of VIT, you can refer to the infer example to check your usage is correct.

Can I have your wechat, I've got some problems in openai vit transform.
Is the VIT you mentioned is the modeling_vit in huggingface?
It seems that the encoder implenmentation is same as that of bert in lightseq.
In other words, Why self_attention and ffn_add_norm in vit_encoder.cc.cu and bert_encoder.cc.cu are identical?

Yes, it's HuggingFace's modeling_vit.

vit and bert have the same structure except the embedding layer.