/pymultiworld

A framework for PyTorch to enable fault management for collective communication libraries (CCL) such as NCCL

Primary LanguagePythonApache License 2.0Apache-2.0

Stargazers