Dao-AILab/flash-attention

Question on FA-2 worker scheme

DianCh opened this issue · 3 comments

Hi authors, may I ask why the figure for worker scheme in the FA-2 paper has a lower-triangle pattern? I thought the double loop should cover the entire rectangle?

Screenshot 2024-06-27 at 5 58 24 PM

Thanks!

The figure is for the case with causal mask. Without the causal mask then it would cover the entire rectangle.

Makes sense, thank you!

The figure is for the case with causal mask. Without the causal mask then it would cover the entire rectangle.

Hi, thanks for the great work. I want to know if I don't want to use the casual mask, how to do the padding mask in FA2?
If there's no padding mask, it would cover the entire rectangle. But it I padded the input, the last few columns are blocked.