Optimization analysis of a matrix dot product algorithm for a HPC(High Performance Computing) cluster node. The current work contains incremental versions with different optimization techniques such as:
- memory locality via block optimization
- vectorization
- multicore processing
- offloading to a co-processor or GPU
The findings and techniques are described in a report aswell as the context in which the tests/benchmarks were performed. Project developed in a Masters in Computer Engineering at Universidade do Minho for the Advanced Architecures class.
For more information you can have a look here.