Dao-AILab/flash-attention

Fast and memory-efficient exact attention

PythonBSD-3-Clause

Issues

flash-attn successfully installed but Flash Attention 2 is not available
#957 opened 5 months ago
6
Feature request: make build PEP 517 compatible
#955 opened 5 months ago
3
Can we build fa with torch 2.3?
#954 opened 5 months ago
2
H20 compatibility
#953 opened 5 months ago
1
Value-dim differnet from query-dim/key-dim is not supported
#952 opened 5 months ago
1
Allow causal mask alignment configuration
#951 opened 5 months ago
1
linux-gnu.so: undefined symbol: _ZN3c104impl3cow11cow_deleterEPv
#950 opened 5 months ago
5
flash decoding algorithm numerical error
#949 opened 5 months ago
2
can not install
#948 opened 5 months ago
1
Three-dimensional local attention
#947 opened 5 months ago
1
Relative postitions
#946 opened 5 months ago
1
[bug] build is verrrrrrrrrrrrrrrrrrrry slow
#945 opened 5 months ago
8
flash attention是否支持RTX8000
#944 opened 5 months ago
1
build failed under miniconda3
#943 opened 5 months ago
2
您好，如何在日志增加输出Tokens/gpu/s和TFLOPS
#942 opened 5 months ago
0
Fewer matrix multiplications, same results, should we consider adopting it?
#941 opened 5 months ago
23
flash-attn error flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
#940 opened 5 months ago
3
Does it support Swin Transformer
#939 opened 5 months ago
1
Does FlashAttention 2 use memory coalescing in Nvidia GPU?
#938 opened 5 months ago
1
IndexError: too many indices for tensor of dimension 2
#936 opened 5 months ago
0
need cp312 whl!
#935 opened 5 months ago
2
module 'flash_attn' has no attribute 'flash_attn_varlen_qkvpacked_func'
#934 opened 5 months ago
4
ModuleNotFoundError: No module named 'flash_attn_2_cuda'
#933 opened 5 months ago
4
flash-attention imported, not running
#932 opened 5 months ago
2
flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi
#931 opened 5 months ago
10
Does flash-attention2 support L40?
#930 opened 5 months ago
2
Import Error
#928 opened 5 months ago
6
flash-attn (2.3.0) not supporting PEP 517 builds
#927 opened 5 months ago
0
How can I use FlashAttention with cpp api not python api? Can it be supported?
#926 opened 5 months ago
1
error in named_apply()
#925 opened 5 months ago
0
Any plans to support tree attention mask?
#924 opened 5 months ago
7
File "/home/ppop/Chinese-CLIP/cn_clip/clip/model.py", line 18, in <module> from flash_attn.flash_attention import FlashMHA ModuleNotFoundError: No module named 'flash_attn.flash_attention'
#923 opened 5 months ago
2
Feature Request: Fused Linear and Cross-Entropy Loss
#922 opened 5 months ago
1
why flash can't accelerate on A40 machine?
#921 opened 5 months ago
1
How to install on remote supercomputing center? 2.1.2+cu121 OSError
#920 opened 5 months ago
5
undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
#919 opened 6 months ago
12
Sparse Masking (for Graphs)
#918 opened 6 months ago
0
Adding support for sqrt of softmax scores
#917 opened 6 months ago
2
Why does `nvidia-cuda-runtime-cu12` not work and must have `/usr/local/cuda` version greater than 11.6
#916 opened 6 months ago
3
Incorrect "RuntimeError: FlashAttention only support fp16 and bf16 data type"
#915 opened 6 months ago
9
Do the headdim of q, k, and v must be the same？
#914 opened 5 months ago
1
Why do fwd and bwd alibi mask both call the same logic but use different code?
#913 opened 5 months ago
1
Flash attention not improves the inference speed for decoder models
#910 opened 6 months ago
5
How to use flash attention for inference speed boosting in bert like models
#909 opened 5 months ago
1
How to use the _flash_attn_forward func
#908 opened 6 months ago
1
Train-inference output mismatch when given alibi_slopes?
#907 opened 6 months ago
1
Is there a way to use flash attention and selectively finetune only q projection layer, leaving k and v projection layers frozen?
#906 opened 6 months ago
1
Is it possible for a fa forward function to return both the maximum value and LSE?
#904 opened 6 months ago
2
What is the difference between abiTRUE.whl and abiFALSE.whl?
#903 opened 6 months ago
1
How are the returned attention weights to be interpreted?
#902 opened 5 months ago
1