[question] Attention scoring equation
eJoey opened this issue · 2 comments
eJoey commented
Hi,
Looking at your code I'm having a hard time understanding where this formula came from:
https://github.com/A-Jacobson/minimal-nmt/blob/master/attention.py#L21
Looking at all the papers you linked, I see they usually do transpose(h) * W(out)
(such as the score(ht, hs)
function in Loung's paper)
Looks like you are doing it the other way around. Could you please explain where this is coming from?
Thanks!
trias702 commented
I'm currently stuck on this as well.
@eJoey Did you by chance ever figure out the correct way of doing it?
A-Jacobson commented
perhaps the confusion is that self.W
in the attention layer isn't actual an affine transformation matrix. it's a pytorch linear layer? What sort of problems are you having with the layer?