/flash_attention_inference

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

Primary LanguageC++MIT LicenseMIT

Stargazers

No one’s star this repository yet.