kamping-site/kamping

Wrapper for MPI-3 shared memory

lukashuebner opened this issue · 2 comments

The MPI-3 standard introduced functions which enable multiple ranks on the same node / NUMA-domain to use shared local memory to communicate.

  • Can be used to for example share input data.
  • Faster than RDMA put and get, even on a single node.
  • Partitioning by NUMA-domain currently not standardized.
  • The shared memory region has a different virtual address on the different MPI processes. Some caveats and even more caveats.

Supporting this functionality in KaMPIng could be a unique selling point and very useful. There are probably multiple levels of support:

  1. Wrap the MPI calls to provide a shmalloc().
  2. Implement a C++ allocator and offset_ptr. These might not work with the SLT but possibly with the Boost containers created specifically for shared memory.
  3. Provide faster communication using (a) shared send/recv buffers with parallel serialization/deserialization + fewer messages per node (b) faster inner-node communication (MPI seems to have some problems with inner node communication according to early experiments done by @mschimek). All of this would, however, just be a way of avoiding MPI+OpenMP and simplify being able to claim that you're using hybrid parallelization.

For sake of completeness: It seems as if one could also remap the shared memory region to another virtual address. On a 64bit system, there might even be a large enough block of virtual addresses which are available on all ranks, and we'd thus be able to map the shared memory region there and use raw pointers again.

I think this is a very interesting proposal and I share Lukas's view that this could be a beneficial feature for KaMPIng.
(@lukashuebner you mean MPI+OpenMP, don't you?)

(@lukashuebner you mean MPI+OpenMP, don't you?)

Yes, of course 🙊