eth-cscs/sarus

InfiniBand support

mvhulten opened this issue · 1 comments

One of our users has tested Sarus on our cluster. He gets poor performance and we think it is because it does not use InfiniBand efficiently. This test was with Sarus 1.4.2 (one release behind).

If our assumption is correct, are there plans or is it possible to provide InfiniBand support, e.g. through an OCI hook?

Hi @mvhulten, at the moment we have no plans for a dedicated InfiniBand hook.
I have limited experience with InfiniBand, but it should be possible to make IB-based networks work with the current MPI hook.
Taking as example an MPI hook configuration we have for a system at CSCS:

  • ibverbs libraries (e.g. /usr/lib64/libibverbs.so.1 and /usr/lib64/libibverbs/libmlx5-rdmav25.so) should be added to MPI_DEPENDENCY_LIBS;
  • the path for drivers config files (e.g. /etc/libibverbs.d) should be added to BIND_MOUNTS;
  • InfiniBand devices (e.g. /dev/infiniband/issm0, /dev/infiniband/rdma_cm, /dev/infiniband/umad0, and /dev/infiniband/uverbs0) should be added to BIND_MOUNTS

In case you would like the IB network to be exposed into any container, an alternative option could be to define ibverbs libraries and config directory as siteMounts and the IB devices as siteDevices.

I hope these tips are useful!