cutlass
There are 13 repositories under cutlass topic.
bytedance/flux
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
coderonion/awesome-cuda-triton-hpc
🔥🔥🔥 A collection of some awesome public CUDA, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR and High Performance Computing (HPC) projects.
DD-DuDa/Cute-Learning
Examples of CUDA implementations by Cutlass CuTe
leimao/CUTLASS-Examples
CUTLASS and CuTe Examples
Bruce-Lee-LY/flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
YashasSamaga/ConvolutionBuildingBlocks
GEMM and Winograd based convolutions using CUTLASS
yester31/Cutlass_EX
study of cutlass
Bruce-Lee-LY/cutlass_gemm
Multiple GEMM operators are constructed with cutlass to support LLM inference.
sgl-project/whl
Kernel Library Wheel for SGLang
qdLMF/LightGlue-with-FlashAttentionV2-TensorRT
A cutlass cute implementation of headdim-64 flashattentionv2 TensorRT plugin for LightGlue. Run on Jetson Orin NX 8GB with TensorRT 8.5.2.
DefTruth/CUDA-Learn-Notes
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
Routhleck/blocksparse-pytorch-implement
pytorch implements block sparse