Triangular matrices ?

Question

Triangular matrices ?

jeremycochoy opened this issue 4 years ago · 10 comments

Does the current implementation provide triangular matrices (to constrain the attention always on the "left" of the sequence, both for input and encoded values) as described in the last section of the original paper?

Answer 1 · 2020-12-07T18:56:07.000Z

@jeremycochoy Hi Jeremy, do you mean in the autoregressive (unidirectional) case? I only see triangular matrices being mentioned in that context

Answer 2 · 2020-12-07T19:32:22.000Z

@jeremycochoy can you point me at this passage in the paper?

Answer 3 · 2020-12-07T19:43:13.000Z

Yes, its page 17, Annexe B.1. I don't know to which extends it is complex to implement this, if not already there.

Answer 4 · 2020-12-07T19:51:15.000Z

@jeremycochoy ohh I see, yeah, that is for the unidirectional case, and it is already taken care of, through a cumulative sum actually (no masking needed)

Answer 5 · 2020-12-07T19:51:37.000Z

@jeremycochoy you don't need to worry about that detail, you just need to set causal = True and you are good to go

Answer 6 · 2020-12-07T19:53:23.000Z

just to make sure we are looking at the same thing lol

Answer 7 · 2020-12-07T19:58:12.000Z

There is no words to say how happy I am to learn it, thats awesome (yes we are looking at the same thing). I can't wait to test it. :)

Answer 8 · 2020-12-07T20:06:46.000Z

@jeremycochoy good timing, since @Sleepychord just caught and fixed a big bug in that part of the code loll

Answer 9 · 2021-03-03T12:28:12.000Z

@jeremycochoy good timing, since @Sleepychord just caught and fixed a big bug in that part of the code loll

Am I understanding it correctly that because of the pretty neat cumsum, we could even run the EncDec version without a decoder mask & still wouldn't spoil the ground truth to the model?

Answer 10 · 2021-03-03T16:53:36.000Z

& so in practice we can construct attn masks the same way for inputs & outputs and they are treated the same way by the model? @lucidrains