Hi ,
thanks for your great work
I have some questions.
Why in the details implementation, just use square than mean,not using L2-norm in the paper you described?
def at(x):
return F.normalize(x.pow(2).mean(1).view(x.size(0), -1))
def at_loss(x, y):
return (at(x) - at(y)).pow(2).mean()