microsoft/mscclpp

[Bug]mscclpp::DeviceSyncer::sync Assertion failed

linstreamer opened this issue · 1 comments

mpirun -tag-output -np 8 python3 ./python/mscclpp_benchmark/allreduce_bench.py
reports this error:

mscclpp/include/mscclpp/concurrency_device.hpp:37: void mscclpp::DeviceSyncer::sync(int, signed long): block: [91,0,0], thread: [0,0,0] Assertion (atomicLoad(&flag_, memoryOrderRelaxed) != tmp) failed

[1,6]: end.synchronize()
[1,6]: File "cupy/cuda/stream.pyx", line 164, in cupy.cuda.stream.Event.synchronize
[1,6]: File "cupy_backends/cuda/api/runtime.pyx", line 977, in cupy_backends.cuda.api.runtime.eventSynchronize
[1,6]: File "cupy_backends/cuda/api/runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status
[1,6]:cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorAssert: device-side assert triggered.

Thu Jun 27 06:05:32 2024
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
| 0 NVIDIA L20 Off | 00000000:0E:00.0 Off | 0 |

v0.5.1

  1. NVIDIA L20 is not officially supported by mscclpp. See prerequisites.
  2. Your error is likely caused by that our benchmark code tries to use more SMs than available in NVIDIA L20: e.g., see code.