Flash Attention in ~100 lines of CUDA (forward pass only)
Primary LanguageCudaApache License 2.0Apache-2.0