Issues
- 0
import thunderkittens error
#74 opened by klxy0304 - 1
ERR_NVGPUCTRPERM error when trying to profile
#73 opened by alxndrTL - 1
Docs alarm
#68 opened by alexdremov - 1
【question】May I ask if there is a performance comparison for flash_ attention 3?
#61 opened by gzy19990617 - 8
[Feature Request] GEMM benchmarks and FP8 Support
#23 opened by jwfromm - 0
- 3
Load with ldmatrix
#27 opened by liyanc - 1
Could you provide a gemm kernel?
#55 opened by ziyuhuang123 - 2
c++20 does not work?
#45 opened by ziyuhuang123 - 1
Could you provide a valid mirror?
#57 opened by ziyuhuang123 - 0
- 1
h100.cu(97): error: "wait" is ambiguous
#54 opened by ziyuhuang123 - 0
- 0
Confusing Comment in rt.cuh
#48 opened by KAOZUOI - 1
[bug report][4090 attn] cudaCheckError(): too many resources requested for launch
#37 opened by kexve - 0
Support for global load/store padding
#44 opened by Hprairie - 1
Template error
#43 opened by Hprairie - 0
Cross-GPU portability
#42 opened by janEbert - 0
- 0
- 0
Error running make
#40 opened by BurhanUlTayyab - 0
- 0
add suport for a100 atten
#31 opened by MichoChan - 1
- 3
[Question] Supported compute capabilities?
#21 opened by bayley - 0
Support for TPUs?
#35 opened by jaanli - 3
[bug report] h100 attn_causal kernel
#33 opened by xiayuqing0622 - 4
Add support for head dimension 128
#26 opened by perklet - 2
- 1
Two questions
#25 opened by dongrixinyu - 3
unable to reproduce attn_causal speeds
#22 opened by 152334H