flash-nanoGPT (Under development)

a jax (flax) re-write of Andrej Karpathy NanoGPT, this repository will hold a collection of Jax/Flax new features like : Pallas kernel language for flashAttention on TPU, Data and tensor sharding with Jax on TPU

Todos

Future work

Experimenting with Jax tensor sharding
Gradient checkpointing
Experimenting with fine-tuning techniques
...

data generation

in order to run training using TPU VM, copy the generated data files into a GCP bucket

Acknowledgement

Big thanks to TPU Research Cloud for providing v2-8/v3-8/v3-32 TPU instances on Google Cloud.

References

Original nanoGPT repositories [1]
jax based nanoGPT repositories [1] [2]
Nvidia mixed precision training [1]
Google Cloud documentation [1]

AmrMKayid/flash-nanoGPT

flash-nanoGPT (Under development)

Todos

Future work

data generation

Acknowledgement

References