A-Jacobson/minimal-nmt

[question] Attention scoring equation

eJoey opened this issue · 2 comments

eJoey commented

Hi,
Looking at your code I'm having a hard time understanding where this formula came from:
https://github.com/A-Jacobson/minimal-nmt/blob/master/attention.py#L21

Looking at all the papers you linked, I see they usually do transpose(h) * W(out) (such as the score(ht, hs) function in Loung's paper)

Looks like you are doing it the other way around. Could you please explain where this is coming from?
Thanks!

I'm currently stuck on this as well.

@eJoey Did you by chance ever figure out the correct way of doing it?

perhaps the confusion is that self.W in the attention layer isn't actual an affine transformation matrix. it's a pytorch linear layer? What sort of problems are you having with the layer?