microsoft/mscclpp

[Bug]mscclpp-tests dont exit after test.

TonyWu199 opened this issue · 3 comments

ENV: gcc 10.2 cmake 3.25
System: Centos 7

Hi developers,
I meet some problems when I run mscclpp in single node with 8 cards. I run the following command ./test/mscclpp-test/allreduce_test_perf -b 2m -e 48m -G 1 -n 100 -w 20 -f 2 -k 5. The console prints correct results but fails to exit the program. I kill the host side mpirun process, and the device-side process all hang and can not be killed.

Could you use our docker image? You can find the image at here: https://github.com/microsoft/mscclpp/pkgs/container/mscclpp%2Fmscclpp

Could you use our docker image? You can find the image at here: https://github.com/microsoft/mscclpp/pkgs/container/mscclpp%2Fmscclpp

I tried docker in cuda12.2, the program still hang after print the # Out of bounds values : 0 OK, without exiting.

@TonyWu199 please provide detailed environment information, including the GPU name, GPU driver version, PCIe or NVLink, and any machine configuration info.