[Bug]mscclpp::DeviceSyncer::sync Assertion failed
linstreamer opened this issue · 1 comments
mpirun -tag-output -np 8 python3 ./python/mscclpp_benchmark/allreduce_bench.py
reports this error:
mscclpp/include/mscclpp/concurrency_device.hpp:37: void mscclpp::DeviceSyncer::sync(int, signed long): block: [91,0,0], thread: [0,0,0] Assertion (atomicLoad(&flag_, memoryOrderRelaxed) != tmp)
failed
[1,6]: end.synchronize()
[1,6]: File "cupy/cuda/stream.pyx", line 164, in cupy.cuda.stream.Event.synchronize
[1,6]: File "cupy_backends/cuda/api/runtime.pyx", line 977, in cupy_backends.cuda.api.runtime.eventSynchronize
[1,6]: File "cupy_backends/cuda/api/runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status
[1,6]:cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorAssert: device-side assert triggered.
Thu Jun 27 06:05:32 2024
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
| 0 NVIDIA L20 Off | 00000000:0E:00.0 Off | 0 |
v0.5.1
- NVIDIA L20 is not officially supported by mscclpp. See prerequisites.
- Your error is likely caused by that our benchmark code tries to use more SMs than available in NVIDIA L20: e.g., see code.