sgemm
There are 11 repositories under sgemm topic.
Liu-xiandong/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
wangzyon/NVIDIA_SGEMM_PRACTICE
Step-by-step optimization of CUDA SGEMM
salykova/matmul.c
Fast multi-threaded matrix multiplication in C
mz24cn/gemm_optimization
The repository targets the OpenCL gemm function performance optimization. It compares several libraries clBLAS, clBLAST, MIOpenGemm, Intel MKL(CPU) and cuBLAS(CUDA) on different matrix sizes/vendor's hardwares/OS. Out-of-the-box easy as MSVC, MinGW, Linux(CentOS) x86_64 binary provided. 在不同矩阵大小/硬件/操作系统下比较几个BLAS库的sgemm函数性能,提供binary,开盒即用。
Stefan20162016/maxas-explained
maxas Scott Grey's maxas assembler sgemm explaining the (for me) missing parts https://github.com/NervanaSystems/maxas
yui0/ugemm
GEMM
c3sr/scope
A benchmark framework for POWER and x86_64
fsword73/SGEMM_on_VEGA
An alternative SGEMM implementation on AMD Vega Series
JunLee85/ARM32-SGEMM-LIB
a fast sgemm lib with fix 16 enable on arm 32
XiaoSong9905/cuda-v100-kernels
CUDA Kernels on V100
aidevnn/CuPyFirstExample
CuPy first example computing GEMM with cuBlas, with handwritten cuda kernel and with NumPy-blas