opalkale/matrix-multiply-optimization
Used cache blocking, parallelizing, loop unrolling, register blocking, loop ordering, and SSE instructions to optimize the multiplication of large matrices to 55 gFLOPS
C
Used cache blocking, parallelizing, loop unrolling, register blocking, loop ordering, and SSE instructions to optimize the multiplication of large matrices to 55 gFLOPS
C