cuda_optimize

This is a repository for the project of the "High Performance Computing". The goal of the project is to optimize a given CUDA code.

It includes the following sub-projects:

  • sgemm: A single precision matrix-matrix multiplication
  • reduce: reduction of an array of floats
  • reduce_nvidia: reduction of an array of floats from nvidia samples
  • wmma: C++ warp matrix operations
  • flashAtten: A sample code for Flash Attention
  • float4: A sample for float4 operations