Dao-AILab/flash-attention

Why we have a third barrier::QueryEmpty arrive?

Opened this issue · 1 comments

Like in figure below, I see three arrive for barrier: QueryEmpty. In mma_init, we arrive and then the load's sync could continue. But why we arrive again when we finish gemm?

image

For the next tile. With persistent kernels this mma is run in a loop.