flash-attention-2

There are 9 repositories under flash-attention-2 topic.

DefTruth/Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
2.7k 91 6185
DefTruth/CUDA-Learn-Notes
🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
Language:Cuda1.3k 12 6149
arihanv/Shush
Shush is an app that deploys a WhisperV3 model with Flash Attention v2 on Modal and makes requests to it via a NextJS app
Language:TypeScript186 4 731
alexzhang13/flashattention2-custom-mask
Triton implementation of FlashAttention2 that adds Custom Masks.
Language:Python70 5 126
Bruce-Lee-LY/flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
Language:C++22 1 23
BBC-Esq/WhisperS2T-transcriber
Uses the powerful WhisperS2T and Ctranslate2 libraries to batch transcribe multiple files
Language:Python15 3 60
graphcore-research/flash-attention-ipu
Poplar implementation of FlashAttention for IPU
Language:C++3 4 0
gietema/attention
Toy Flash Attention implementation in torch
Language:Python10
lalitdotdev/transcribeX
Transcribe audio in minutes with OpenAI's WhisperV3 and Flash Attention v2 + Transformers without relying on third-party providers and APIs. Host it yourself or try it out.
Language:TypeScript0 1 00

flash-attention-2

DefTruth/Awesome-LLM-Inference

DefTruth/CUDA-Learn-Notes

arihanv/Shush

alexzhang13/flashattention2-custom-mask

Bruce-Lee-LY/flash_attention_inference

BBC-Esq/WhisperS2T-transcriber

graphcore-research/flash-attention-ipu

gietema/attention

lalitdotdev/transcribeX