nanoGPT: A Python repository from EIFY

See the original repo for detailed instructions and context. 🦁 Lion - Pytorch implementation courtesy of lucidrains.

So 🦁 is definitely better than AdamW: lower loss and higher iter/s, but somehow (fused) AdamW burns less power and spends less time accessing memory, presumably thanks to the highly-optimized fused kernel...?

EIFY/nanoGPT