UCSD FA22 CSE260 Course Programming Assignments
Implemented GeMM(General Matrix Multiplication) by blocking algorithm Blas. Accelerated by SIMD on an ARM-64 architecture. Performance can be compared with blas method.
Implemented GeMM by CuBlas blocking algorithm accelerated by GPU.
Implemented simulation of Aliev-Panfilov model on a super machine. Communication done by MPI.