hgemm
There are 5 repositories under hgemm topic.
DefTruth/CUDA-Learn-Notes
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
Bruce-Lee-LY/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
Bruce-Lee-LY/cuda_hgemv
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
DefTruth/hgemm-tensorcores-mma
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, achieve peak⚡️ performance
Bruce-Lee-LY/cuda_back2back_hgemm
Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.