Bruce-Lee-LY/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
CudaMIT
Stargazers
- abangdd
- alexeigorMoscow
- chengtianwu
- chiakicage
- const0what
- cyberl0afing
- DefTruth@PaddlePaddle
- FabianSchuetze
- feihugis@Microsoft
- foreverrookie
- galiyu
- hbhflw2000
- HJzhang-sjtuSJTU
- hkeee21
- Huangxt57Sun Yat-sen University
- hyaihjq
- irasin
- jysh1214
- ken-matsuiUniversity of Washington
- laevatinNorth Carolina State University
- legendlc
- lianxintao
- lonelybeansprouts
- MT-SW-chen
- qelk123XJTU
- rightchoseZhejiang University
- SandalotsVolcanak
- sleepwalker2017
- xcwang1999Xidian university
- xlinker1
- Xu-KaiNational University of Singapore
- YangWang92
- Zhiwei35Intel
- zhl201226
- zlwu92China
- ZQPeiByteDance