lucidrains/performer-pytorch

Triangular matrices ?

jeremycochoy opened this issue · 10 comments

Does the current implementation provide triangular matrices (to constrain the attention always on the "left" of the sequence, both for input and encoded values) as described in the last section of the original paper?

@jeremycochoy Hi Jeremy, do you mean in the autoregressive (unidirectional) case? I only see triangular matrices being mentioned in that context

@jeremycochoy can you point me at this passage in the paper?

Yes, its page 17, Annexe B.1. I don't know to which extends it is complex to implement this, if not already there.

@jeremycochoy ohh I see, yeah, that is for the unidirectional case, and it is already taken care of, through a cumulative sum actually (no masking needed)

@jeremycochoy you don't need to worry about that detail, you just need to set causal = True and you are good to go

Screen Shot 2020-12-07 at 11 52 53 AM

just to make sure we are looking at the same thing lol

There is no words to say how happy I am to learn it, thats awesome (yes we are looking at the same thing). I can't wait to test it. :)

@jeremycochoy good timing, since @Sleepychord just caught and fixed a big bug in that part of the code loll

@jeremycochoy good timing, since @Sleepychord just caught and fixed a big bug in that part of the code loll

Am I understanding it correctly that because of the pretty neat cumsum, we could even run the EncDec version without a decoder mask & still wouldn't spoil the ground truth to the model?

& so in practice we can construct attn masks the same way for inputs & outputs and they are treated the same way by the model? @lucidrains