mmperf is a single core GEMM benchmark. This repository aims to benchmark Matrix Multiply (SGEMM) hand-tuned libraries and code generation stacks on a single thread on one CPU core. The focus will be on machine learning workloads so FP32 or smaller and irregular sizes of matrices. The goal is to expose high performance atomic kernels that can then be used to build highly efficient higher level implemenations spanning multiple cores or distributed across systems when efficient atomic kernels are asynchrously scheduled with overlapping communicaitons (interchip, in a system or across a system).
- Intel MKL
- OpenBLAS
- RUY
- Accelerate
- BLIS
- MLIR
- Halide
- TVM
- Nod.AI
Note: 8GB Mac Mini runs roughly 25% slower than the 16GB version on other tests.
For more details see mmperf on Github.
mmperf aims to be a collaborative effort though primarily developed by nod.ai so if you can get better performance or add a new backend please submit a PR.