Issues
- 2
flash-attention imported, not running
#932 opened by Jacck - 6
Incorrect "RuntimeError: FlashAttention only support fp16 and bf16 data type"
#915 opened by jlamypoirier - 7
Hope one day flash-attention can support T4 GPU
#887 opened by hit56 - 1
Backprop through LSE
#889 opened by abf149 - 9
flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi
#931 opened by rjmehta1993 - 6
Any plans to support tree attention mask?
#924 opened by KexinFeng - 1
How can I use FlashAttention with cpp api not python api? Can it be supported?
#926 opened by muoshuosha - 2
File "/home/ppop/Chinese-CLIP/cn_clip/clip/model.py", line 18, in <module> from flash_attn.flash_attention import FlashMHA ModuleNotFoundError: No module named 'flash_attn.flash_attention'
#923 opened by wrtppp - 11
undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
#919 opened by abhayjain0x - 6
Import Error
#928 opened by Techinix - 3
- 0
- 4
- 2
need cp312 whl!
#935 opened by Slarper - 1
- 2
Does flash-attention2 support L40?
#930 opened by askcs517 - 3
Why does `nvidia-cuda-runtime-cu12` not work and must have `/usr/local/cuda` version greater than 11.6
#916 opened by lvzii - 0
flash-attn (2.3.0) not supporting PEP 517 builds
#927 opened by JPonsa - 3
- 0
error in named_apply()
#925 opened by bhatipriyanka - 7
Numerical difference between flash_attn_varlen_kvpacked_func and vanilla x-attention implementation
#886 opened by rafaelvalle - 1
Feature Request: Fused Linear and Cross-Entropy Loss
#922 opened by imoneoi - 1
Why do fwd and bwd alibi mask both call the same logic but use different code?
#913 opened by muoshuosha - 2
Consulting about V100
#891 opened by 1633347510 - 1
- 1
How to use flash attention for inference speed boosting in bert like models
#909 opened by pradeepdev-1995 - 1
- 1
why flash can't accelerate on A40 machine?
#921 opened by zhangxihou - 0
Sparse Masking (for Graphs)
#918 opened by thorinf-orca - 5
- 2
Adding support for sqrt of softmax scores
#917 opened by snarayan21 - 1
How to specify cuda 1 when finetuning?
#901 opened by karry5921 - 1
How to use the _flash_attn_forward func
#908 opened by lll143653 - 1
Is there a way to use flash attention and selectively finetune only q projection layer, leaving k and v projection layers frozen?
#906 opened by yxchng - 2
Is it possible for a fa forward function to return both the maximum value and LSE?
#904 opened by Infi-zc - 1
- 3
- 0
v1 algorithm typo in v2 paper
#897 opened by andportnoy - 2
- 1
- 5
Interesting observations.
#894 opened by plusgrey - 1
In the paged attention mode, whether the kcache space must be malloced in a continuous space?
#892 opened by NengchaoPan - 0
Any idea of this error?
#890 opened by XinDongol - 1
Installation Error
#888 opened by JPGranizo - 0
Typo in paper?
#884 opened by jhss - 2
[Question] Why isn't 32fp supported?
#882 opened by taewan2002 - 4
- 2
[Question] Support for Varlen Seqs with causal=False
#881 opened by Infi-zc - 2
Explanation of batch_size+1 on cu_seqlen for flash_attn_varlen_qkvpacked_func
#880 opened by rafaelvalle - 3