/Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Primary LanguagePythonOtherNOASSERTION

Megatron-Deepspeed - On Premise Exection

Cluster Information

Workload Manager : Slurm 
Container Runtim  : Enroot + PyXis
Worker Nodes : 20 X Nvidia DGX A100-80 GB