Dao-AILab/flash-attention

Fast and memory-efficient exact attention

PythonBSD-3-Clause

Issues

FlashAttention versions
#878 opened 6 months ago
1
abi true or false
#877 opened 6 months ago
1
Question about softmax_lse in flash_attn_varlen_func
#876 opened 5 months ago
1
ERROR: Could not build wheels for flash_attn, which is required to install pyproject.toml-based projects
#875 opened 7 months ago
2
安装csrc/rotraty和csrc/layer_norm均报错 No module named 'torch._C'，但是torch已经安装
#874 opened 5 months ago
0
Error in install flash_attn
#873 opened 7 months ago
3
Does the flash attention support sparse pretocken<seq_q and seq_q != seq_Kv?
#872 opened 7 months ago
1
Allow `return_softmax` when dropout is disabled
#871 opened 6 months ago
3
Remove build dependencies for minimal docker images
#869 opened 7 months ago
0
will the flash attention embed self-extend?
#868 opened 7 months ago
1
import flash attention errror
#867 opened 7 months ago
5
Inference benchmarks of Flash Decoding (+confounding package changes)
#866 opened 7 months ago
4
Is there a kernel variable-length input with rotational embeddings and cache?
#865 opened 7 months ago
0
Bug in loading of pretrained BERT weights
#864 opened 7 months ago
2
Ask: Support FP8 KVCache in inference
#863 opened 7 months ago
0
Inference benchmarks of `flash_attn_with_kv_cache`
#862 opened 7 months ago
0
Broken source distribution
#861 opened 7 months ago
1
v2.5.5 fails to build on NVIDIA Jetson AGX Orin (aarch64/arch=compute_87,code=sm_87)
#860 opened 7 months ago
6
Mistral sliding_window implementation and flash_attn_func
#857 opened 5 months ago
2
Does rotary_kernel support packed qkv?
#855 opened 7 months ago
0
ImportError: undefined symbol
#854 opened 7 months ago
4
flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so: undefined symbol
#853 opened 7 months ago
7
flash-attention with -100 in variable positions
#851 opened 7 months ago
2
How did flash-attn compute attention for cu_seqlens
#850 opened 5 months ago
3
torch.compile(fullgraph=True) support for flash-decoding
#849 opened 7 months ago
8
seqlen_q = 1 breaks passing in `out` to `mha_fwd`
#848 opened 7 months ago
1
Building wheel for flash-attn (setup.py) ... error (my torch version = 2.1.1+cu118)
#847 opened 7 months ago
4
ALiBi slopes are incorrectly cast to `torch.float16`
#845 opened 5 months ago
0
Flash attention on Nvidia Jetson Orin AGX and Xavier NX
#844 opened 7 months ago
2
build fail
#843 opened 7 months ago
2
The cuda version is 11.6. Why does this error still appear during installation?
#842 opened 7 months ago
1
Why is paged attention faster than its non-paged counterpart?
#841 opened 7 months ago
5
Are there any plans for supporting an explicit attention mask?
#840 opened 7 months ago
5
Is flash attention support ring self-attention?
#839 opened 7 months ago
1
Error with triton
#838 opened 5 months ago
2
Flash Attention 2 Error -> undefined symbol: _ZN2at4_ops9_pad_enum4callERKNS_6TensorEN3c108ArrayRefINS5_6SymIntEEElNS5_8optionalIdEE
#836 opened 7 months ago
26
RuntimeError: `<class 'flash_attn.layers.rotary.RotaryEmbedding'>' was not properly set up for sharding by zero.Init()
#835 opened 7 months ago
2
ESM NAN Values
#834 opened 7 months ago
1
Can't be installed using "uv" due to an issue in the setup script
#833 opened 7 months ago
7
Q: Support for Nvidia L4
#830 opened 5 months ago
1
c
#829 opened 7 months ago
0
Why does paged KV Attention block size have to be at least a multiple 256?
#828 opened 5 months ago
5
flash-attn is substantially outperformed by PyTorch's scaled_dot_product_attention (and installing from source is super slow)
#827 opened 7 months ago
6
Support for Dynamic SplitFuse
#826 opened 7 months ago
1
torch.__version__ = 2.2.0+cu118 but showing Error: FlashAttention is only supported on CUDA 11.6 and above.
#825 opened 7 months ago
4
Allowing providing `cos` and `sin` as float instead of same as input
#823 opened 7 months ago
4
FlashAttention works with single GPU, but crash with accelerate DP on multiple GPU (FlashAttention only support fp16 and bf16 data type)
#822 opened 8 months ago
8
[Bug] Compatibility issue with torch 2.2.0
#821 opened 8 months ago
4
Pip install --no-build-isolation Error on Win10 [C++ type casting Error during cuda extension build
#820 opened 8 months ago
1
I would like to understand why rotary_dim has to be divisible by 16 in flash_attn_with_kvcache.
#817 opened 5 months ago
1