Which version of Flash Attention has been used in this project?

Question

Closed this issue 3 months ago · 2 comments

Hi,

I find the project is very interesting. Can I ask which Flash Attention has been used in this project?

From the official flash attention project, they have provided flash attention and flash attention v2.
https://github.com/Dao-AILab/flash-attention

Kind regards,
Qiming

Answer 1 · 2024-04-18T06:00:28.000Z

@14H034160212 Thank you for your interest. I used PyTorch's Scaled Dot-Product Attention to speed up inference speed instead of Dao's implementations.

Answer 2 · 2024-05-22T06:20:19.000Z

Do you use the FlashAttention-2 that implemented here? There are three supported implementations of scaled dot product attention.