/data_paralellism

Implementation of data-parallel training with sharded optimization state with Nikola Jurkovic

Primary LanguagePythonMIT LicenseMIT

Watchers