hgemm

There are 5 repositories under hgemm topic.

  • DefTruth/CUDA-Learn-Notes

    📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

    Language:Cuda2k159206
  • Bruce-Lee-LY/cuda_hgemm

    Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

    Language:Cuda32941468
  • Bruce-Lee-LY/cuda_hgemv

    Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

    Language:Cuda54504
  • DefTruth/hgemm-tensorcores-mma

    ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, achieve peak⚡️ performance

    Language:Cuda42110
  • Bruce-Lee-LY/cuda_back2back_hgemm

    Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.

    Language:Cuda11212