Workload Manager : Slurm
Container Runtim : Enroot + PyXis
Worker Nodes : 20 X Nvidia DGX A100-80 GB
wnagesh/Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
PythonNOASSERTION
Ongoing research training transformer language models at scale, including: BERT & GPT-2
PythonNOASSERTION