[question] Attention scoring equation

Question

[question] Attention scoring equation

eJoey opened this issue 6 years ago · 2 comments

Hi,
Looking at your code I'm having a hard time understanding where this formula came from:
https://github.com/A-Jacobson/minimal-nmt/blob/master/attention.py#L21

Looking at all the papers you linked, I see they usually do transpose(h) * W(out) (such as the score(ht, hs) function in Loung's paper)

Looks like you are doing it the other way around. Could you please explain where this is coming from?
Thanks!

Answer 1 · 2018-11-15T20:09:13.000Z

I'm currently stuck on this as well.

@eJoey Did you by chance ever figure out the correct way of doing it?

Answer 2 · 2018-11-16T00:47:54.000Z

perhaps the confusion is that self.W in the attention layer isn't actual an affine transformation matrix. it's a pytorch linear layer? What sort of problems are you having with the layer?