MenghaoGuo/PCT

question about the normalization on the attention weight

amiltonwong opened this issue · 1 comments

HI, @MenghaoGuo ,

From the code in cls and partseg, the attention weights are already normalized by self.softmax(). Why did you add an extra line attention / (1e-9 + attention.sum(dim=1, keepdims=True)) for weight normalization ?

Any particular reason?

Thanks~

Hi,
Good question.
Please be care for the dimension of normalization, in the experiments, we find this way can make the training process more stable.