/matmul

Parallel Matrix Multiplication

Primary LanguageC

CS61C Project 3: Optimizing Single Precision Matrix Multiplication

Single-thread code for part 1 is in sgemm-small.c
It performs at about 10.9 Gflop/s.

Parallel code for part 2 is in sgemm-openmp.c
It performs at about 13.1 Gflop/s single-thread; ~95 Gflop/s with 8 threads.

All benchmarks were performed on hive servers in 330 Soda, with Intel Xeon E5620 processors (2.4 GHz, 12MB Cache).