This is a repository for the project of the "High Performance Computing". The goal of the project is to optimize a given CUDA code.
It includes the following sub-projects:
- sgemm: A single precision matrix-matrix multiplication
- reduce: reduction of an array of floats
- reduce_nvidia: reduction of an array of floats from nvidia samples
- wmma: C++ warp matrix operations
- flashAtten: A sample code for Flash Attention
- float4: A sample for float4 operations