Dao-AILab/flash-attention

Fast and memory-efficient exact attention

PythonBSD-3-Clause

Issues

6 hours building on 5090 Windows
#1560 opened 10 days ago by HDANILO
6
undefined symbol: _ZNK3c1011StorageImpl27throw_data_ptr_access_errorEv
#1509 opened a month ago by luna0804
2
ImportError: /home/data/miniconda3/envs/museTalk/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
#1518 opened a month ago by gg22mm
2
Noticeable diff between `flash_attn_varlen_qkvpacked_func` and `flash_attn_with_kvcache` outputs
#1569 opened 5 days ago by NonameUntitled
2
FlashAttention 3 has numerical inconsistency on H100 with varlen kv cache
#1570 opened 5 days ago by preminstrel
2
`torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
#1530 opened 6 days ago by allentsouhuang
1
no cuda runtime while pip installing flash_attn=2.7.4.post1
#1566 opened 6 days ago by hwang136
0
Compilation Error
#1568 opened 7 days ago by hwang136
1
installation is too long
#1567 opened 7 days ago by diablounbounded
0
flash_attetion doesn't work on windows+wsl RTX 5090
#1563 opened 9 days ago by HDANILO
16
Window Attention cannot benefit from GQA
#1564 opened 8 days ago by LKJacky
0
Triton version is faster in both forward and backward when head dim is 64 but slower in both when head dim is 128
#1556 opened 11 days ago by SonicZun
10
Forced Flash Attention For Vision Description Models Excludes Turing GPUs
#1557 opened 10 days ago by amadou-6e
1
Serialization when creating hash reduces pipeline option customizability
#1558 opened 10 days ago by amadou-6e
1
FA3 support for 50 series Nvidia GPU(sm_120)
#1549 opened 15 days ago by shahizat
1
Is FlashAttention-3 incompatible with Windows compilation? #1551
#1553 opened 12 days ago by digitalusman99
1
How to Implement Block Sparse Attention with FlashAttention?
#1555 opened 12 days ago by RafaDD
0
Is flash-attention3 unable to be compiled on Windows?
#1551 opened 14 days ago by lestersssss
0
Bad `RPATH` in pre-compiled Linux wheels
#1548 opened 16 days ago by sisp
2
window11install flash_attn train issue
#1550 opened 14 days ago by partcompany
0
Cannot use FlashAttention-2 backend for head size 72.
#1542 opened 19 days ago by ifyoulovexxz
3
FlashAttention forward support for Turing
#1533 opened a month ago by ssiu
4
Wheel compilation hangs
#1545 opened 17 days ago by KLL535
0
[QST] Got compilation error when compiling flash-attention-3 with CUDA 12.3
#1544 opened 18 days ago by umiswing
0
Does v's head_dim have to equal q's or k's ?
#1541 opened 20 days ago by janelu9
1
RTX 5090 support - built from scratch but still getting - RuntimeError: FlashAttention only supports Ampere GPUs or newer.
#1540 opened 20 days ago by elkay
2
Some question about the TFLOP of FA2 backward
#1525 opened a month ago by foreverpiano
3
A way to seamlessly replace `flash-attention` with something to allow for low-data inference on other platforms
#1527 opened 22 days ago by marinegor
2
prebuild wheels dear lord
#1531 opened a month ago by kunibald413
2
build variable FLASH_ATTENTION_TRITON_AMD_ENABLE used at runtime
#1538 opened 25 days ago by trixirt
0
bug in flash_attn_varlen_func when max_seqlen == 1
#1537 opened 25 days ago by zhenwendai
0
Which branch is flash attention 3 correct branch?
#1523 opened a month ago by FurkanGozukara
7
Inquiry About flash_attn3 b1 usability
#1516 opened 25 days ago by luyvlei
3
Flash_attn 1.x whl ?
#1508 opened 25 days ago by darkon12
1
Does it support RTX 8000?
#1513 opened 25 days ago by youde2000
1
Flash attention hangs when running an openchat model inside a docker container
#1532 opened a month ago by zeionara
1
flash_attn_3 can not be import
#1536 opened 25 days ago by zdxff
0
我使用 src/train.py 进行训练，提示 --tokenized_path没有这个参数，用的是0.9.2的版本
#1535 opened 25 days ago by yifanhunter
0
How to speed up setup? AMD 9950X 96GB RAM and RTX 5090
#1519 opened a month ago by FurkanGozukara
3
AMD 9070 and XT launch
#1529 opened a month ago by bennmann
0
How To Get Attention Map with Flash-Attn
#1520 opened a month ago by i11box
2
Is there a function that can be used solely to compute the dot product of q and k?
#1526 opened a month ago by liushuhao1130
1
Flash Attention 3 compile fails (able to compile 2) : TypeError: _write_ninja_file() got an unexpected keyword argument 'sycl_cflags'
#1524 opened a month ago by FurkanGozukara
13
Why rotary embedding is only implemented for Split KV kernels
#1521 opened a month ago by phantaurus
7
Does it support 2080Ti?
#1514 opened a month ago by Lsnxiaoxiong
3
Installing flash-attn & torch with requirements.txt raises build error
#1515 opened a month ago by erenirmak
1
[QST] Why are Layout<Shape<Int<kNWarps>, _1, _1>> and Tile<Int<16 * kNWarps>, _16, _16>> used in kernel_traits.h for TiledMma?
#1512 opened a month ago by March-H
6
Where is the flash_attn_2_cuda
#1506 opened a month ago by VirgoAsumita
0
I have compiled for Windows but where is the WHL file? Finished processing dependencies for flash-attn==2.7.4.post1 - compiled for CU128 for RTX 5000 series
#1510 opened a month ago by FurkanGozukara
1
na
#1505 opened a month ago by Sk3W3d
0