Question about masking

Question

Question about masking

Closed this issue 5 months ago · 2 comments

Hi, I am very new to the triton code. I am curious about how is the causal mask implemented. Is it implicitly assumed in the triton code because you use the cumulative sum form? In particular, I wonder how this line and the line below implement the causal masking?

Answer 1 · 2024-01-21T00:14:45.000Z

for interchunk ops, since there is no overlap between two consecutive chunks, so there is no causal mask.

for intrachunk ops, i have one in https://github.com/berlino/gated_linear_attention/blob/main/kernels/intra_chunk_contribution/fn_only_gk.py#L205C1-L206C1

Answer 2 · 2024-01-21T00:20:28.000Z

Thanks a lot for the extremely prompt reply :)