A simple high performance CUDA GEMM, Block Sparse GEMM and Non-uniform Quantized GEMM implementation.
Primary LanguageCuda