https://github.com/flame/how-to-optimize-gemm/wiki
Copyright by Prof. Robert van de Geijn (rvdg@cs.utexas.edu).
Adapted to Github Markdown Wiki by Jianyu Huang (jianyu@cs.utexas.edu).
- The GotoBLAS/BLIS Approach to Optimizing Matrix-Matrix Multiplication - Step-by-Step
- NOTICE ON ACADEMIC HONESTY
- References
- Set Up
- Step-by-step optimizations
- Computing four elements of C at a time
- Computing a 4 x 4 block of C at a time
- Acknowledgement
- [BLISlab: A Sandbox for Optimizing GEMM] (https://github.com/flame/blislab)
- [GEMM: From Pure C to SSE Optimized Micro Kernels] (http://apfel.mathematik.uni-ulm.de/~lehn/sghpc/gemm/)
This material was partially sponsored by grants from the National Science Foundation (Awards ACI-1148125/1340293).
Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).