szagoruyko/attention-transfer

Loss function problems

jacky4323 opened this issue · 0 comments

Hi ,

thanks for your great work
I have some questions.
Why in the details implementation, just use square than mean,not using L2-norm in the paper you described?

image

def at(x):
    return F.normalize(x.pow(2).mean(1).view(x.size(0), -1))


def at_loss(x, y):
    return (at(x) - at(y)).pow(2).mean()