softmax1

Pinned Repositories

EsperBERTo
A test of the Attention Is Off By One hypothesis
Language:Python0 2 00
Flash-Attention-Softmax-N
CUDA and Triton implementations of Flash Attention with SoftmaxN.
Language:Python66 2 45
llama2.c-tinystories
Inference Llama 2 in one file of pure C
Language:Jupyter Notebook0 0 00
MosaicBERT-Softmax1
Language:Python00
nanoGPT_softmax1
An experiment using nanoGPT vs nanoGPT (softmax1) to see how it affects perplexity score
Language:Python0 2 10
nanoGPT_softmax1_reddit
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Language:Python0 0 00
quietGPT
A scaled down empirical study of "Attention is Off by One" on nanoGPT
Language:Python3 2 10

softmax1's Repositories

softmax1/Flash-Attention-Softmax-N
CUDA and Triton implementations of Flash Attention with SoftmaxN.
Language:Python66 2 45
softmax1/quietGPT
A scaled down empirical study of "Attention is Off by One" on nanoGPT
Language:Python3 2 10
softmax1/EsperBERTo
A test of the Attention Is Off By One hypothesis
Language:Python0 2 00
softmax1/llama2.c-tinystories
Inference Llama 2 in one file of pure C
Language:Jupyter Notebook0 0 00
softmax1/MosaicBERT-Softmax1
Language:Python00
softmax1/nanoGPT_softmax1
An experiment using nanoGPT vs nanoGPT (softmax1) to see how it affects perplexity score
Language:Python0 2 10
softmax1/nanoGPT_softmax1_reddit
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Language:Python0 0 00