microsoft/mscclpp

[Feature] How to run on a machine having multi-gpus with one infiniband?

jhlee508 opened this issue · 9 comments

Hi mscclpp's developers. I am learning about mscclpp codes.

Now I am testing the mp_unit tests on my server. However, since my server has multiple gpus but only one infiniband per node, most of the unit tests (17 out of 27) failed.

I was wondering if there is a way to make it work on a server with multiple GPUs but only one IB per node?

If there are any options or methods for this, please let me know. It would be really helpful😊. Thank you.

Sorry for late response. But I'm not sure about your use-case. We use IB for cross nodes communication. For intra node communication we use NVLink.
And for the unit tests, the IB test works when you have more than 2 IB cards, in your situation, you can ignore the error safely.

Thanks for the response. I have further questions about this.

It seems that mscclpp requires as many IBs as there are GPUs, not just 2 IBs. The following is what I got as an error if running:

mpirun -np 4 ... ./test/mscclpp-test/allreduce_test_perf ...
terminate called after throwing an instance of 'std::out_of_range'
  what():  IB transport out of range: 1 >= 1
terminate called after throwing an instance of 'std::out_of_range'
  what():  IB transport out of range: 2 >= 1
terminate called after throwing an instance of 'std::out_of_range'
  what():  IB transport out of range: 3 >= 1

This is perhaps because in my machine there is one IB transport. (It works well if only involving 1 GPU per node in inter-node allreduce communication.)

Also, I have tried to ignore and bypass this error by commenting out the relevant codes, but unfortunately met another error of segmentation fault as follows.

Caught signal 11 (Segmentation fault: address not mapped to object at address 0x18)
Caught signal 11 (Segmentation fault: address not mapped to object at address 0x18)
Caught signal 11 (Segmentation fault: address not mapped to object at address 0x3e0)

Is this an unexpected issue? Or is it just natural, and if so, how can I fix it?

If you only want to test single node perf, you can change this line:

const mscclpp::TransportFlags allTransports = mscclpp::Transport::CudaIpc | IBs[args_.gpuNum];

to const mscclpp::TransportFlags allTransports = mscclpp::Transport::CudaIpc.
Then mscclpp will avoid using IB to communicate. (Don't try by myself, guess it will work). You need to replace all occurrences in the file.

If you want to test multi nodes perf, we don't have such algo for your topology, maybe you need to write a new algo with mscclpp API.

Sorry for late response. But I'm not sure about your use-case. We use IB for cross nodes communication. For intra node communication we use NVLink. And for the unit tests, the IB test works when you have more than 2 IB cards, in your situation, you can ignore the error safely.

Sorry to bother, does that mean that only nvlink-equipped server can user msccppl?

Sorry for late response. But I'm not sure about your use-case. We use IB for cross nodes communication. For intra node communication we use NVLink. And for the unit tests, the IB test works when you have more than 2 IB cards, in your situation, you can ignore the error safely.

Sorry to bother, does that mean that only nvlink-equipped server can user msccppl?

in README.md:

MSCCL++ provides consistent interfaces, i.e., the above interfaces are used regardless of the location of the remote GPU (either on the local node or on a remote node) or the underlying link (either NVLink/xGMI or InfiniBand).

seems PCIe is not supported. I hope this is not true

Sorry for late response. But I'm not sure about your use-case. We use IB for cross nodes communication. For intra node communication we use NVLink. And for the unit tests, the IB test works when you have more than 2 IB cards, in your situation, you can ignore the error safely.

Sorry to bother, does that mean that only nvlink-equipped server can user msccppl?

sorry my fault, pcie is supported. I failed because I use the damn RTX4090.
btw, there are several parts to be modified for single node without IB.

Sorry for late response. But I'm not sure about your use-case. We use IB for cross nodes communication. For intra node communication we use NVLink. And for the unit tests, the IB test works when you have more than 2 IB cards, in your situation, you can ignore the error safely.

Sorry to bother, does that mean that only nvlink-equipped server can user msccppl?

sorry my fault, pcie is supported. I failed because I use the damn RTX4090. btw, there are several parts to be modified for single node without IB.

Thanks for reply, I'm testing mscclpp on RTX4090 too! So why does it failed on 4090? Is there a list for supported devices?

Sorry for late response. But I'm not sure about your use-case. We use IB for cross nodes communication. For intra node communication we use NVLink. And for the unit tests, the IB test works when you have more than 2 IB cards, in your situation, you can ignore the error safely.

Sorry to bother, does that mean that only nvlink-equipped server can user msccppl?

sorry my fault, pcie is supported. I failed because I use the damn RTX4090. btw, there are several parts to be modified for single node without IB.

Thanks for reply, I'm testing mscclpp on RTX4090 too! So why does it failed on 4090? Is there a list for supported devices?

After I modified all the previously mentioned IB settings, it will report error such as p2p access is not supported. This error happened in my RTX4090 but not in A6000.

Sorry for late response. But I'm not sure about your use-case. We use IB for cross nodes communication. For intra node communication we use NVLink. And for the unit tests, the IB test works when you have more than 2 IB cards, in your situation, you can ignore the error safely.

Sorry to bother, does that mean that only nvlink-equipped server can user msccppl?

sorry my fault, pcie is supported. I failed because I use the damn RTX4090. btw, there are several parts to be modified for single node without IB.

Thanks for reply, I'm testing mscclpp on RTX4090 too! So why does it failed on 4090? Is there a list for supported devices?

After I modified all the previously mentioned IB settings, it will report error such as p2p access is not supported. This error happened in my RTX4090 but not in A6000.

Hi, probably your p2p access issue is coming from your motherboard configuration, not from your GPU. Your GPUs should be under the same PCIe switch or root complex. Besides, could you share with me what you had to change regarding the IB?