Dao-AILab/flash-attention

Fast and memory-efficient exact attention

PythonBSD-3-Clause

Issues

a small typo and fix
#1390 opened 7 days ago by liguohao96
3
4 Failing `test_flash_attn_output_fp8` tests on H100
#1404 opened 2 days ago by BioGeek
0
is flash_attn_with_kvcache() supposed to work for seqlen > 1 ?
#1402 opened 3 days ago by vince62s
0
Does bar.sync Emit Semaphores Alongside bar.arrive?
#1403 opened 3 days ago by ziyuhuang123
0
Understanding sync and arrive in FA3 Store Function
#1401 opened 4 days ago by ziyuhuang123
0
Understanding the Role of arrive in NamedBarrier Synchronization
#1400 opened 4 days ago by ziyuhuang123
1
is fwd_kvcache compatible with torch.compile in 2.7.2post1 ?
#1386 opened 8 days ago by vince62s
6
The execution order between GEMM0 of the next iteration and GEMM1 of the current iteration in Pingpong scheduling pipeline for overlapping gemms and softmax between warpgroups
#1398 opened 4 days ago by tengdecheng
0
Why Doesn't FlashAttention3 Allow KV and O to Share Memory Space?
#1396 opened 5 days ago by ziyuhuang123
0
g2s K tensor when handling padding in the seq_k, clear it rather than keeping the default SMEM values.
#1395 opened 5 days ago by NVIDIA-JerryChen
0
Large loss of accuracy between flashattention and native
#1391 opened 6 days ago by fanfanaaaa
2
FA-3 installation errors
#1387 opened 5 days ago by asahni04
0
Why does NamedBarrier in epilogue use NumMmaThreads(256) + NumThreadsPerWarp(32)?
#1389 opened 7 days ago by ziyuhuang123
2
Windows 11 Installation Error
#1388 opened 7 days ago by 404-xianjin
0
Accuracy Drop with Flash-Attention Reimplementation in Encoder-Decoder Architecture (ViT)
#1376 opened 10 days ago by ImaGonEs
2
seq_lens variable used in the attention kernel
#1378 opened 10 days ago by chakpongchung
1
How to get actual col idx
#1385 opened 11 days ago by wenkechen
0
Possible to install with just `torch` installed?
#1379 opened 12 days ago by davidmezzetti
6
[ROCm] benchmark_flash_attention.py failing with Memory Access Fault
#1381 opened 11 days ago by nikhil-tensorwave
0
Flash attention 3 does not use Dropout_p?
#1377 opened 13 days ago by nighting0le01
6
Need `tests/__init__.py` for `hopper/test_flash_attn.py`
#1360 opened 25 days ago by hancheolcho
2
FA3 for cuda12.3 but torch only releases cuda 12.4 version
#1375 opened 15 days ago by wplf
2
Headdim==96 in FA3
#1374 opened 15 days ago by wplf
2
How could I use a query to calculate the attention with multiple k-v
#1350 opened 16 days ago by DongyuXu77
1
Why we have a third barrier::QueryEmpty arrive?
#1372 opened 17 days ago by ziyuhuang123
1
Can wgmma.async and barrier.arrive Ensure GEMM Completion Before Moving Forward?
#1373 opened 17 days ago by ziyuhuang123
2
Question About Initial sync Behavior Without Prior arrive in Warpgroup Scheduling
#1371 opened 17 days ago by ziyuhuang123
2
Question about warp_scheduler_barrier_arrive in FA3 and cutlass::arch::NamedBarrier::arrive Usage
#1370 opened 17 days ago by ziyuhuang123
2
GLT
#1369 opened 17 days ago by deepgandu
0
Is there any way to compile the codes with nvcc debug flag(-G)?
#1364 opened 21 days ago by Dev-Jahn
6
The byzantine copy of Tensor O
#1368 opened 18 days ago by phantaurus
4
Issue Installing cuDNN Python Module via pip install cudnn
#1367 opened 19 days ago by ziyuhuang123
0
How to get attention score? "return_attn_probs=True" is not work.
#1357 opened 19 days ago by UnableToUseGit
1
Sliding Window (Local Attention) possibly incorrect on newest branch
#1366 opened 20 days ago by kilianhaefeli
1
Add support for qk dim different from v dim in PR #1166
#1358 opened 21 days ago by YTianZHU
0
Question of the equation in Flash Attention 2 Paper
#1349 opened a month ago by jeffrey-sunh1
4
Triton Issues for Rotary flash_attn.layers.rotary.apply_rotary_emb_qkv_
#1362 opened 23 days ago by albertotono
0
Unable to cast Python instance of type <class 'torch._subclasses.fake_tensor.FakeTensor'> to C++ type
#1351 opened a month ago by zwhe99
1
Output Discrepancy Between FlashAttention and PyTorch Attention
#1359 opened 24 days ago by pengzhangzhi
2
How to assign ROCm architecture during pip installing
#1356 opened a month ago by deeptimhe
0
Does flash-attn support FP8 inference on L40-48G?
#1355 opened a month ago by LinJianping
0
Flashdecoding with appendKV might incorrect
#1354 opened a month ago by DD-DuDa
0
FP8 test failure on the latest 'decode' branch
#1352 opened a month ago by cscyuge
1
RuntimeError: Error compiling objects for extension
#1346 opened a month ago by beyondguo
5
[Q] why flash attention MFU is over 100% in A800
#1345 opened a month ago by wonderisland
0
breaking change for head size non divisble by 8
#1347 opened a month ago by felix-red-panda
1
Issue with installing flash attention ` import flash_attn_2_cuda as flash_attn_cuda`
#1348 opened a month ago by hahmad2008
1
[Bug] Potential hazard in epilogue when kUseVarSeqLen=true
#1344 opened a month ago by QiZhangNV
2
FA3 Failed to initialize the TMA descriptor
#1343 opened a month ago by li-yi-dong
0
Assistance on implementing Flash Attention 2 for Turing
#1342 opened a month ago by samuelzxu
0