opalkale/matrix-multiply-optimization

Used cache blocking, parallelizing, loop unrolling, register blocking, loop ordering, and SSE instructions to optimize the multiplication of large matrices to 55 gFLOPS

This repository is not active