MILVLG/mcan-vqa

Co-Attention?

dreichCSL opened this issue · 1 comments

The MCAN paper suggests that SGA (i.e. a guided attention module) is only used for question-guided attention over image content, but not the other way around (image-guided attention over question content). Could the authors please explain why they call this "CO-"attention even though there's no image-guided attention over question content? Or did I misunderstand the paper?

Greatly appreciate a response!

We have SA within each modality and GA across modalities. In our paper writting, we need to give a simple name for such composite attention structure. So we use the name co-attention. In our paper, we have mentioned that we have tried the symmetric co-attention structure (e.g., SGA-SGA) you expected, but there is no performance improvemnent.