NVIDIA/cutlass

CUDA Templates for Linear Algebra Subroutines

C++NOASSERTION

Issues

[QST] Why hopper-mixed-gemm's Bandwidth Utilization only have ~9% MBU in H100 SXM5?
#1794 opened 18 days ago by ZZBoom
4
[QST] kInternalError while increasing warp count in older SIMT GEMM kernels.
#1800 opened 17 days ago by Shreya-gaur
0
[QST] SM80_CP_ASYNC_CACHEGLOBAL doesn't allow anything but 128 bit
#1768 opened 17 days ago by seanxwzhang
4
[QST] Split-k in hopper gather scatter gemm
#1798 opened 17 days ago by susavlsh10
0
Why not [RSC] but [C/64, R, S, 64] in kloop of conv implicit gemm?
#1797 opened 18 days ago by liuqi123123
1
[QST] Understanding double buffering in GEMM kernels
#1789 opened 21 days ago by phantaurus
1
[FEA]print_layout can not print 3D case!
#1778 opened 19 days ago by ziyuhuang123
1
[FEA] transpose in epilogue/prologue
#1780 opened 23 days ago by xiaonans
5
[QST] CUDA driver version and runtime version mis-match
#1788 opened 19 days ago by RuokaiYin
2
[FEA] gather/scatter on other dims
#1779 opened 19 days ago by xiaonans
1
[QST] CuTe / Cutlass 1D Convolution
#1758 opened a month ago by jeromeku
1
Which Visual Studio 2022 BuildTools MSVC is the best version for Cuda 11.8 and Cuda 12.4 and so
#1793 opened 20 days ago by FurkanGozukara
0
[BUG] SM90_U32x4_STSM_N for SM90
#1792 opened 21 days ago by jcao-ai
0
[BUG] Simple matrix rotation could not compile
#1783 opened 21 days ago by lucifer1004
1
[QST] Universal convolution supports for sm70/80 using Cute?
#1785 opened 22 days ago by Zxzzzzz
1
[QST] How to dump the IRs after each stage of `cicc`?
#1738 opened a month ago by aws-jiadingg
4
[BUG] Incorrect assertion logic in `check_barrier_in_range` in `barrier.h`
#1781 opened 22 days ago by Algy
1
[QST]Are Tensors Equivalent After Different Layout Transformations?
#1777 opened 23 days ago by ziyuhuang123
0
[BUG] example 06_splitK_gemm should allow post-Volta GPUs
#1775 opened 23 days ago by lucifer1004
0
[DOC] Misleading comment in example 05_batched_gemm
#1773 opened 23 days ago by lucifer1004
0
[QST] The definition of zero stride in CUTE layout algebra
#1772 opened 23 days ago by aws-jiadingg
0
[QST] Gemm results are different with tile_description?
#1769 opened 25 days ago by hxdtest
1
[BUG]The results from different print statements are jumbled together and messy.
#1752 opened a month ago by ziyuhuang123
1
[QST] Static assertion failed when using swizzled layout with gemm
#1766 opened 25 days ago by interestingLSY
5
[QST] Is there an example for implementing gemm problem size like [b, m, k] * [k, n] in the folder `examples`?
#1764 opened a month ago by hxdtest
0
[QST] How to compile and run `examples/35_gemm_softmax` ?
#1728 opened a month ago by hxdtest
1
[FEA] CUDA API [cudaGetDriverEntryPointByVersion]
#1755 opened a month ago by SunNy820828449
2
`02_pytorch_extension_grouped_gemm.ipynb` No kernel configuration found for supported data type and layout combination (<DataType.bf16: 16>
#1757 opened a month ago by hxdtest
2
[QST] Questions about correctness test and layout
#1756 opened a month ago by haeunlee99
0
[FEA] FP8 Convolution
#1750 opened a month ago by MustafaFayez
4
[QST]Why we have three GEMM in cutlass
#1751 opened a month ago by ziyuhuang123
0
[QST] Guidance on Customized Convolution Kernels Using Cutlass3.0 / CuTe
#1749 opened a month ago by phantaurus
2
[QST]What is @ in cute's step?
#1744 opened a month ago by ziyuhuang123
1
[QST]Why copy has barrier inside? And if only one copy will be stuck？
#1748 opened a month ago by ziyuhuang123
1
[QST] Is it possible to create non-contiguous counting tensor?
#1747 opened a month ago by ktaebum
1
[QST] some confusion about layout
#1746 opened a month ago by zhoutianzi666
0
[QST] GEMV implementation with CuTe
#1737 opened a month ago by DD-DuDa
2
[QST] Value mismatches between GEMM kernel-fusion outputs and numpy outputs
#1739 opened a month ago by phantaurus
2
[QST]cute's local_tile and step
#1745 opened a month ago by ziyuhuang123
0
[QST]How to use append?
#1741 opened a month ago by ziyuhuang123
1
[BUG] Release 3.5.0 build failing on Windows using CUDA 12.6, and VS2022 17.11
#1732 opened a month ago by levicki
9
[QST]What's the difference between: pipeline.producer_commit and pipeline.producer_get_barrier
#1729 opened a month ago by ziyuhuang123
2
[QST] how to fix the compiling error: static assertion failed with "Vectors implied by the thread map must be divisible by the access type."
#1740 opened a month ago by alephchang
1
[QST] Why _CUTLASS_TYPE_TO_TORCH_TYPE doesn't support torch.bfloat16?
#1736 opened a month ago by hxdtest
1
[QST] SegFault when performing TiledCopy
#1735 opened a month ago by phantaurus
13
[BUG][typo] is this a typo in media/docs/cute/0x_gemm_tutorial.md?
#1734 opened a month ago by aws-jiadingg
1
[QST] Difference between `make_fragment_like` and `make_tensor_like`
#1731 opened a month ago by seanxwzhang
4
[QST]Why rowMajor for A and B is different?
#1730 opened a month ago by ziyuhuang123
0
[BUG] CMakeLists.txt file is missing a double quote " at Line 237
#1726 opened a month ago by Shreya-gaur
4
[QST]Why sm90 mma has prologue and mainloop?
#1725 opened a month ago by ziyuhuang123
0