Implementation of data-parallel training with sharded optimization state with Nikola Jurkovic
Primary LanguagePythonMIT LicenseMIT