nanoGPT with MXFP4 quantization

install

pip install torch numpy transformers datasets tiktoken wandb tqdm
cd mxfp4_kernel
pip install .

Dependencies:

pytorch <3
numpy <3
transformers for huggingface transformers <3 (to load GPT-2 checkpoints)
datasets for huggingface datasets <3 (if you want to download + preprocess OpenWebText)
tiktoken for OpenAI's fast BPE code <3
wandb for optional logging <3
tqdm for progress bars <3

python data/openwebtext/prepare.py

This downloads and tokenizes the OpenWebText dataset. It will create a train.bin and val.bin

In order to run the baseline run:

torchrun --standalone --nproc_per_node=8 train.py config/train_gpt2_124m.py

In order to run the MXFP4 quantization model run:

torchrun --standalone --nproc_per_node=8 train.py config/train_gpt2_124m_mxfp4.py

model	params	train loss
gpt2	124M	2.92
gpt2-mxfp4	124M	3.10

In order to run a bigger model, simple add to the config file the following lines:

n_layer = 24| n_head = 16 | n_embd = 1024 |

n_layer = 36 | n_head = 20 | n_embd = 1280 |

n_layer = 48 | n_head = 25 | n_embd = 1600 |