/CUDA

accelerating inference and training for transformer-based models by building cuda kernels that optimally saturate the memory bandwidth and arithmetic capabilities of hopper h100s

Primary LanguageCuda

CUDA