Aaryan0404/CUDA
accelerating inference and training for transformer-based models by building cuda kernels that optimally saturate the memory bandwidth and arithmetic capabilities of hopper h100s
Cuda
accelerating inference and training for transformer-based models by building cuda kernels that optimally saturate the memory bandwidth and arithmetic capabilities of hopper h100s
Cuda