Matrix Multiplication - Peformance Analysis

Optimization analysis of a matrix dot product algorithm for a HPC(High Performance Computing) cluster node. The current work contains incremental versions with different optimization techniques such as:

memory locality via block optimization
vectorization
multicore processing
offloading to a co-processor or GPU

The findings and techniques are described in a report aswell as the context in which the tests/benchmarks were performed. Project developed in a Masters in Computer Engineering at Universidade do Minho for the Advanced Architecures class.

For more information you can have a look here.

jcm300/MatrixMultiplicationAlgorithmsAndImplementations

Matrix Multiplication - Peformance Analysis