Tom-Teamo/cuda_optimize

Cuda

cuda_optimize

This is a repository for the project of the "High Performance Computing". The goal of the project is to optimize a given CUDA code.

It includes the following sub-projects:

sgemm: A single precision matrix-matrix multiplication
reduce: reduction of an array of floats
reduce_nvidia: reduction of an array of floats from nvidia samples
wmma: C++ warp matrix operations
flashAtten: A sample code for Flash Attention
float4: A sample for float4 operations