Co-Attention?

Question

Co-Attention?

dreichCSL opened this issue a year ago · 1 comments

The MCAN paper suggests that SGA (i.e. a guided attention module) is only used for question-guided attention over image content, but not the other way around (image-guided attention over question content). Could the authors please explain why they call this "CO-"attention even though there's no image-guided attention over question content? Or did I misunderstand the paper?

Greatly appreciate a response!

Answer 1 · 2024-01-17T02:38:15.000Z

We have SA within each modality and GA across modalities. In our paper writting, we need to give a simple name for such composite attention structure. So we use the name co-attention. In our paper, we have mentioned that we have tried the symmetric co-attention structure (e.g., SGA-SGA) you expected, but there is no performance improvemnent.