sgemm

There are 11 repositories under sgemm topic.

  • Liu-xiandong/How_to_optimize_in_GPU

    This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.

    Language:Cuda8031315126
  • wangzyon/NVIDIA_SGEMM_PRACTICE

    Step-by-step optimization of CUDA SGEMM

    Language:Cuda2062433
  • salykova/matmul.c

    Fast multi-threaded matrix multiplication in C

    Language:C163507
  • mz24cn/gemm_optimization

    The repository targets the OpenCL gemm function performance optimization. It compares several libraries clBLAS, clBLAST, MIOpenGemm, Intel MKL(CPU) and cuBLAS(CUDA) on different matrix sizes/vendor's hardwares/OS. Out-of-the-box easy as MSVC, MinGW, Linux(CentOS) x86_64 binary provided. 在不同矩阵大小/硬件/操作系统下比较几个BLAS库的sgemm函数性能,提供binary,开盒即用。

    Language:C14305
  • Stefan20162016/maxas-explained

    maxas Scott Grey's maxas assembler sgemm explaining the (for me) missing parts https://github.com/NervanaSystems/maxas

    Language:CSS13103
  • yui0/ugemm

    GEMM

    Language:C10223
  • c3sr/scope

    A benchmark framework for POWER and x86_64

    Language:Mathematica76481
  • fsword73/SGEMM_on_VEGA

    An alternative SGEMM implementation on AMD Vega Series

    Language:Assembly7214
  • JunLee85/ARM32-SGEMM-LIB

    a fast sgemm lib with fix 16 enable on arm 32

    Language:C3003
  • XiaoSong9905/cuda-v100-kernels

    CUDA Kernels on V100

    Language:Cuda3101
  • aidevnn/CuPyFirstExample

    CuPy first example computing GEMM with cuBlas, with handwritten cuda kernel and with NumPy-blas

    Language:Cuda201