CUDASharedMemoryMatrixMul

Matrix structure used:
We used .bin files with raw numbers, being the first one and the second one the number of rows and the number of columns respectively.

In order to creathe those matrix as easy as possible, a .cpp file is added. In it we can create 2 matrix, the first one with random numbers and a given size, and the second one will be an identity matrix, with a given size aswell.

Kernel:
This kernel was used to study diferent computation times with diferent matrix sizes.
The multiplication is done as in the following image:

Multiplying 2 matrix with size of 10000x10000, we obtained the following results:

Trying to do a secuencial multiplication with a simple FOR loop:
-Unable to computate

Doing an 8 threads static division multiplication:
-1595'099 sec

Using CUDA with shared memory:
-18'914 sec

With CUDA we obtained a speedup of 84'334302 compared with static division.

The results were obtained with an Intel Xenon and a nVidia GTX 560.

AlbertoSoutullo/CUDASharedMemoryMatrixMul

CUDASharedMemoryMatrixMul