sinAshish/Multi-Scale-Attention

About the code before softmax in CAM_Module

kaneyxx opened this issue · 2 comments

Hi, thanks for sharing this awesome project :)

Here's a question while reading your source code. In CAM_Module, there is one line code before softmax function. It doesn't exist in PAM_Module. According to my understanding, it means you use the maximum value (which calculated by query dot key) each channel vector to minus every value respectively. But...it will be the larger number equals the more irrelevant channel, right?
Sorry... I cannot understand this, could you kindly explain it for me? Thanks a lot!

CAM_Module

Hi @kaneyxx

This is added to prevent loss divergence during training. The CAM module is borrowed from DANet. Our understanding from that line is that it will enforce to pay more attention to more dissimilar channels.

Best

It makes sense for me, and I will go for DANet paper after this. Thank you!