Issues
- 6
6 hours building on 5090 Windows
#1560 opened by HDANILO - 2
- 2
ImportError: /home/data/miniconda3/envs/museTalk/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
#1518 opened by gg22mm - 2
Noticeable diff between `flash_attn_varlen_qkvpacked_func` and `flash_attn_with_kvcache` outputs
#1569 opened by NonameUntitled - 2
FlashAttention 3 has numerical inconsistency on H100 with varlen kv cache
#1570 opened by preminstrel - 1
`torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
#1530 opened by allentsouhuang - 0
- 1
Compilation Error
#1568 opened by hwang136 - 0
installation is too long
#1567 opened by diablounbounded - 16
flash_attetion doesn't work on windows+wsl RTX 5090
#1563 opened by HDANILO - 0
Window Attention cannot benefit from GQA
#1564 opened by LKJacky - 10
Triton version is faster in both forward and backward when head dim is 64 but slower in both when head dim is 128
#1556 opened by SonicZun - 1
- 1
- 1
FA3 support for 50 series Nvidia GPU(sm_120)
#1549 opened by shahizat - 1
- 0
- 0
Is flash-attention3 unable to be compiled on Windows?
#1551 opened by lestersssss - 2
Bad `RPATH` in pre-compiled Linux wheels
#1548 opened by sisp - 0
window11install flash_attn train issue
#1550 opened by partcompany - 3
- 4
FlashAttention forward support for Turing
#1533 opened by ssiu - 0
Wheel compilation hangs
#1545 opened by KLL535 - 0
- 1
Does v's head_dim have to equal q's or k's ?
#1541 opened by janelu9 - 2
RTX 5090 support - built from scratch but still getting - RuntimeError: FlashAttention only supports Ampere GPUs or newer.
#1540 opened by elkay - 3
Some question about the TFLOP of FA2 backward
#1525 opened by foreverpiano - 2
A way to seamlessly replace `flash-attention` with something to allow for low-data inference on other platforms
#1527 opened by marinegor - 2
prebuild wheels dear lord
#1531 opened by kunibald413 - 0
- 0
bug in flash_attn_varlen_func when max_seqlen == 1
#1537 opened by zhenwendai - 7
Which branch is flash attention 3 correct branch?
#1523 opened by FurkanGozukara - 3
Inquiry About flash_attn3 b1 usability
#1516 opened by luyvlei - 1
Flash_attn 1.x whl ?
#1508 opened by darkon12 - 1
Does it support RTX 8000?
#1513 opened by youde2000 - 1
Flash attention hangs when running an openchat model inside a docker container
#1532 opened by zeionara - 0
flash_attn_3 can not be import
#1536 opened by zdxff - 0
- 3
- 0
AMD 9070 and XT launch
#1529 opened by bennmann - 2
How To Get Attention Map with Flash-Attn
#1520 opened by i11box - 1
Is there a function that can be used solely to compute the dot product of q and k?
#1526 opened by liushuhao1130 - 13
Flash Attention 3 compile fails (able to compile 2) : TypeError: _write_ninja_file() got an unexpected keyword argument 'sycl_cflags'
#1524 opened by FurkanGozukara - 7
- 3
Does it support 2080Ti?
#1514 opened by Lsnxiaoxiong - 1
- 6
[QST] Why are Layout<Shape<Int<kNWarps>, _1, _1>> and Tile<Int<16 * kNWarps>, _16, _16>> used in kernel_traits.h for TiledMma?
#1512 opened by March-H - 0
Where is the flash_attn_2_cuda
#1506 opened by VirgoAsumita - 1
I have compiled for Windows but where is the WHL file? Finished processing dependencies for flash-attn==2.7.4.post1 - compiled for CU128 for RTX 5000 series
#1510 opened by FurkanGozukara - 0