/Mu-scaling

Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales

Primary LanguagePython

No issues in this repository yet.