/Muon

Muon optimizer for neural networks: >30% extra sample efficiency, <3% wallclock overhead

Primary LanguagePythonMIT LicenseMIT

Stargazers