Sample matrix multiply code to show affect of blocking and data alignment The code mm.c accompanies two papers at software.intel.com that discuss memory layout and performance. A simple matrix multiply is reordered and blocked to show performance improvement An exercise is included to show the impact on performance when matrices are not aligned on cacheline boundaries.