Dao-AILab/flash-attention

Fast and memory-efficient exact attention

PythonBSD-3-Clause

Issues

flash-attention imported, not running
#932 opened 2 months ago by Jacck
2
Incorrect "RuntimeError: FlashAttention only support fp16 and bf16 data type"
#915 opened 3 months ago by jlamypoirier
6
Hope one day flash-attention can support T4 GPU
#887 opened 3 months ago by hit56
7
Backprop through LSE
#889 opened 3 months ago by abf149
1
flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi
#931 opened 2 months ago by rjmehta1993
9
Any plans to support tree attention mask?
#924 opened 2 months ago by KexinFeng
6
How can I use FlashAttention with cpp api not python api? Can it be supported?
#926 opened 2 months ago by muoshuosha
1
File "/home/ppop/Chinese-CLIP/cn_clip/clip/model.py", line 18, in <module> from flash_attn.flash_attention import FlashMHA ModuleNotFoundError: No module named 'flash_attn.flash_attention'
#923 opened 2 months ago by wrtppp
2
undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
#919 opened 2 months ago by abhayjain0x
11
Import Error
#928 opened 2 months ago by Techinix
6
ModuleNotFoundError: No module named 'flash_attn_2_cuda'
#933 opened 2 months ago by Hansyvea
3
IndexError: too many indices for tensor of dimension 2
#936 opened 2 months ago by heroding77
0
module 'flash_attn' has no attribute 'flash_attn_varlen_qkvpacked_func'
#934 opened 2 months ago by william-ngvn
4
need cp312 whl!
#935 opened 2 months ago by Slarper
2
Train-inference output mismatch when given alibi_slopes?
#907 opened 3 months ago by seunghunJi
1
Does flash-attention2 support L40?
#930 opened 2 months ago by askcs517
2
Why does `nvidia-cuda-runtime-cu12` not work and must have `/usr/local/cuda` version greater than 11.6
#916 opened 3 months ago by lvzii
3
flash-attn (2.3.0) not supporting PEP 517 builds
#927 opened 2 months ago by JPonsa
0
How to install on remote supercomputing center? 2.1.2+cu121 OSError
#920 opened 2 months ago by fisher75
3
error in named_apply()
#925 opened 2 months ago by bhatipriyanka
0
Numerical difference between flash_attn_varlen_kvpacked_func and vanilla x-attention implementation
#886 opened 3 months ago by rafaelvalle
7
Feature Request: Fused Linear and Cross-Entropy Loss
#922 opened 2 months ago by imoneoi
1
Why do fwd and bwd alibi mask both call the same logic but use different code?
#913 opened 2 months ago by muoshuosha
1
Consulting about V100
#891 opened 2 months ago by 1633347510
2
How are the returned attention weights to be interpreted?
#902 opened 2 months ago by thoglu
1
How to use flash attention for inference speed boosting in bert like models
#909 opened 2 months ago by pradeepdev-1995
1
Do the headdim of q, k, and v must be the same？
#914 opened 2 months ago by zhanglonghao1992
1
why flash can't accelerate on A40 machine?
#921 opened 2 months ago by zhangxihou
1
Sparse Masking (for Graphs)
#918 opened 2 months ago by thorinf-orca
0
Flash attention not improves the inference speed for decoder models
#910 opened 3 months ago by pradeepdev-1995
5
Adding support for sqrt of softmax scores
#917 opened 3 months ago by snarayan21
2
How to specify cuda 1 when finetuning?
#901 opened 3 months ago by karry5921
1
How to use the _flash_attn_forward func
#908 opened 3 months ago by lll143653
1
Is there a way to use flash attention and selectively finetune only q projection layer, leaving k and v projection layers frozen?
#906 opened 3 months ago by yxchng
1
Is it possible for a fa forward function to return both the maximum value and LSE?
#904 opened 3 months ago by Infi-zc
2
What is the difference between abiTRUE.whl and abiFALSE.whl?
#903 opened 3 months ago by uRENu
1
How to compute the exact activation memory by formula?
#899 opened 3 months ago by Ethan-yt
3
v1 algorithm typo in v2 paper
#897 opened 3 months ago by andportnoy
0
CUDA Graph support for paged and non paged attention
#896 opened 3 months ago by ramyaprabhu-alt
2
Bug when window_size_right > max_seq_k and seq_q > seq_k ?
#895 opened 3 months ago by helloLLM666
1
Interesting observations.
#894 opened 3 months ago by plusgrey
5
In the paged attention mode, whether the kcache space must be malloced in a continuous space?
#892 opened 3 months ago by NengchaoPan
1
Any idea of this error?
#890 opened 3 months ago by XinDongol
0
Installation Error
#888 opened 3 months ago by JPGranizo
1
Typo in paper?
#884 opened 3 months ago by jhss
0
[Question] Why isn't 32fp supported?
#882 opened 4 months ago by taewan2002
2
window11 python3.10 cu117, can't intall on those version
#883 opened 4 months ago by TaucherLoong
4
[Question] Support for Varlen Seqs with causal=False
#881 opened 4 months ago by Infi-zc
2
Explanation of batch_size+1 on cu_seqlen for flash_attn_varlen_qkvpacked_func
#880 opened 4 months ago by rafaelvalle
2
Has anyone successfully compiled this on ARM Linux (aarch64)?
#879 opened 4 months ago by elkay
3