/pipegoose

Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*

Primary LanguagePythonMIT LicenseMIT

Watchers