[Bug]Meet IB problem in single node experiment?
TonyWu199 opened this issue · 1 comments
TonyWu199 commented
Hi developers,
It is a really nice work and I try to reproduct in one node with 8 gpus. However, I meet the IB problems below in both UT and collective communication case.
The UT
command:
mpirun -np 2 ./test/mp_unit_tests
The all_reduce_test in c++
command:
mpirun --bind-to numa -np 8 ./test/mscclpp-test/allreduce_test_perf -b 3m -e 48m -G 100 -n 100 -w 20 -f 2 -k 5
In my view, the gpu communication intra a single node is unrelated to IB, right?
Could you help fix this problem or maybe some walkarounds?
Binyang2014 commented
Please refer this comment: #254 (comment)