/nanoGPT

nanoGPT tweaked for single 16GB GPU with the option to use Lion

Primary LanguagePythonMIT LicenseMIT

See the original repo for detailed instructions and context. 🦁 Lion - Pytorch implementation courtesy of lucidrains.

Screenshot 2023-02-23 at 3 24 13 PM

Screenshot 2023-02-23 at 3 26 10 PM

Screenshot 2023-02-23 at 3 27 15 PM

Screenshot 2023-02-23 at 3 28 12 PM

So 🦁 is definitely better than AdamW: lower loss and higher iter/s, but somehow (fused) AdamW burns less power and spends less time accessing memory, presumably thanks to the highly-optimized fused kernel...?