cisco-open/pymultiworld
A framework for PyTorch to enable fault management for collective communication libraries (CCL) such as NCCL
PythonApache-2.0
A framework for PyTorch to enable fault management for collective communication libraries (CCL) such as NCCL
PythonApache-2.0