Threaded-Matrix-Mulitplication

It's implemented in two different ways. The first is that every row in the output matrix is computed by its own thread. The second is that every element in the output matrix is computed by it own thread.