Bruce-Lee-LY/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
CudaMIT
Issues
- 1
咨询:Share Mem bank Confict.
#4 opened by matrix97317 - 3
Change to block of 128 by 256
#3 opened by yupei-ms - 1
#define CHUNK_K 2 // 32 / WMMA_K
#2 opened by lk137095576 - 1
mma_naive结果不正确
#1 opened by FdyCN