Dao-AILab/flash-attention

Why Doesn't FlashAttention3 Allow KV and O to Share Memory Space?

ziyuhuang123 opened this issue · 0 comments

I noticed in kernel_traits that in FA3, Q and K are kept fixed in memory, while V and O can reuse the same space. However, isn't Q the only tensor that must remain fixed? (Since our block keeps moving to the right, Q must stay fixed, while K and V are continuously updated.)

Why not allow KV and O to share memory space (using a union)? Is it because O occupies very little space, making such a modification unnecessary?