microsoft/mscclpp

[Feature] Using the AMD Infinity Fabric in SMChannels

ThomasNing opened this issue · 4 comments

I am trying to use the SMChannels way to connect AMD GPUs together with Infinity Fabric.

In the core.hpp, here is the transport enum class:
"/// Enumerates the available transport types.
enum class Transport {
Unknown, // Unknown transport type.
CudaIpc, // CUDA IPC transport type.
Nvls, // NVLS transport type.
IB0, // InfiniBand device 0 transport type.
IB1, // InfiniBand device 1 transport type.
IB2, // InfiniBand device 2 transport type.
IB3, // InfiniBand device 3 transport type.
IB4, // InfiniBand device 4 transport type.
IB5, // InfiniBand device 5 transport type.
IB6, // InfiniBand device 6 transport type.
IB7, // InfiniBand device 7 transport type.
Ethernet, // Ethernet transport type.
NumTransports, // The number of transports.
};
"

Could I know which transport type I should select in here?

SmChannels can only work on the CudaIpc transport.

So to make the AMD Infinity Fabric route happen should I switch to ProxyChannel and select Nvls as the type? Or I should stay in the SmChannel and select type as the CudaIpc

Both SmChannel and ProxyChannel will work over AMD Infinity Fabric if you use the CudaIpc transport. Nvls is specific to NVIDIA H100 or later GPUs.

Got it! Thank you for the help Changhao!