`image_len` is not uesd?
Opened this issue · 5 comments
VG-GPLMs/src/models/modeling_bart.py
Line 749 in ecd40e8
image_len
is not uesd in calculate attn?
image_len=None, means the default value is None, you can pass a int list wiht batch size to this function
I mean is "do not use image_len
in calculate attn(as mask
)"
And is some error in attn softmax dim?
VG-GPLMs/src/models/modeling_bart.py
Line 882 in ecd40e8
attn shape is [batch_size(0), text_len(1), image_len(2)], should "softmax" in "image_len dim (2)"
So I think "the softmax dim should 2 not 1"?
is there something wrong with my thinking?
I see. The image_len is not used in the multimodal fusion function. You can put this as a mask in the cross-attention. Probably it can improve the performance slightly.
VG-GPLMs/src/models/modeling_bart.py
Line 882 in ecd40e8
I think L882
should be 【reason see up】
attn = F.softmax(attn, dim=2)
am I wrong?