question about wi

WilyZhao8 opened this issue · 2 comments

dear authors:
Thank you fou your excellent work, and I read the article today.
But I have a quention about Wi as shown in Equation (5): why Wi = 1 means generate negative gradients?
I am looking forward to your replay.

For a sample x belonging to the category k, it will activate the classifier of category k to force the network to output high probability while suppress other classifiers of category i ( i != k) to get low probabilities on these categories. So when we set Wi = 1 (i != k), we expect to keep the negative suppression gradients to categories i (i != k). That's why we need to judge whether i is equals to k in Equation (6). If i = k, Wi = 1 means activating classifier k.

For a sample x belonging to the category k, it will activate the classifier of category k to force the network to output high probability while suppress other classifiers of category i ( i != k) to get low probabilities on these categories. So when we set Wi = 1 (i != k), we expect to keep the negative suppression gradients to categories i (i != k). That's why we need to judge whether i is equals to k in Equation (6). If i = k, Wi = 1 means activating classifier k.

Thank you, I understood what it meant