Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
Primary LanguageCudaGNU General Public License v3.0GPL-3.0