lucidrains/linear-attention-transformer

causal = True

wajihullahbaig opened this issue · 2 comments

Naive question!
causal = True , is this used to create a mask that trims/clips the diagonal right half of the attention matrix?

Thank you!

@wajihullahbaig yup! in linear attention, you do this with a cumulative sum instead of the triangular mask!

@wajihullahbaig yup! in linear attention, you do this with a cumulative sum instead of the triangular mask!

Much appreciated for the reply!

Thanks!