bytedance/lynx-llm

How is the dimension of pos_embed handled when the image size is increased to 420x420 in the pre-training phase?

yutaidong opened this issue · 2 comments

How is the dimension of pos_embed handled when the image size is increased to 420x420 in the pre-training phase?

or just replace the model with a different visual encoder?

You can refer to function interpolate_pos_embed in models/vits/eva_vit.py ~