JasonChen9/flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
C++MIT
No issues in this repository yet.
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
C++MIT
No issues in this repository yet.