/attention

Random experiments

Primary LanguageJupyter Notebook

WIP

Trying to learn, somewhere midway between inference using OpenAI level and writing CUDA kernels, how transformers work and how I can make changes to standard code to implement my random ideas efficiently.

READMEs in respective folders describe what I'm trying to achieve.