An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hundreds of billions of parameters or larger.
Primary LanguagePythonMIT LicenseMIT
No issues in this repository yet.