flash-mla

There are 2 repositories under flash-mla topic.

  • Awesome-LLM-Inference

    xlite-dev/Awesome-LLM-Inference

    📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

    Language:Python4.5k1328306
  • ffpa-attn

    xlite-dev/ffpa-attn

    🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.

    Language:Cuda217