MPI C++ proposal

Strong similarity between MPI and HDF5 systems allows significant code/pattern reuse from H5CPP project. While there are naming differences in concepts such as HDF5 property lists vs MPI_info, the main building blocks remain the same:

type system
static reflection
handles/identifier: RAII,
daisy chaining properties can easily be converted into setting MPI_info key/value pairs where needed
intuitive pythonic syntax with template metaprogramming
feature detection idiom based mechanism to support STL like containers: std::vector ...

Features

static reflection based on LLVM/clang
PBLAS/SCALAPACK support: block cycling distribution
full interop with C MPI code: handles are binary equivalent
full interop with HDF5 systems with H5CPP by the same author
full linear algebra support for major linear algebra systems: armadillo, eigen3, ...
full STL like object support with feature detection idiom based approach
generative programming: only the necessary code is instantiated
RAII idiom to make sure resources are closed
H5CPP style structured exceptions: mpi::error::any rules them all
intuitive syntax

Examples

The proposed mpi c++ will be able to handle arbitrary complex pod_t type -- type descriptors generated at compile time --; based on h5cpp-compiler and LLVM based static reflection tool, keeping the option open to switch over to purely template based reflection when it becomes available. In the following reworked from 'ring.cxx' example.

#include "h5mpi/all"
#include <iostream>

namespace some {
    struct pod_t{
        size_t count,
        int array[10],
        ... arbitrary complex, cascading .. 
    };
}

int main(int argc, char *argv[]) {
    // registers mpi::finalize() at exit, similar behaviour of python and julia
    mpi::init(); 
    // it would be nice to have global variable `mpi::rank` as opposed to 
    // a function call; but  
    int next = (mpi::rank() + 1) % mpi::size(),
    prev = (mpi::rank() + mpi::size() - 1) % mpi::size() [, tag];

    if (mpi::rank() == 0) mpi::send( 
        // passed as reference, object created in place
        some::pod_t{..}, next [, tag]);

    while (true) { // passing the message around
        // RVO, needs template specialization
        some::pod_t message = mpi::receive<some::pod_t>(prev [, tag]);
        if (mpi::rank() == 0) -- message.count; 

        mpi::send(message, next [, tag]);
        if (message.count == 0) // exiting
            break;
    }
    some::pod_t message; 
    if (mpi::rank() == 0) // message passed by reference, will be updated
        mpi::receive(message, prev, tag);
    
    return 0; // mpit::finalize() gets called 
}

another example with scatter - gather operation for containers

#include <h5mpi>
...
int main(int argc, char* argv[]){
    mpi::init(); // will register mpi::finalize with std::atexit
    int total = elements_per_proc * world_size;
    // scatter/gather: arguments may be passed in arbitrary order, `mpi::world` may be implicit
    // `segment` uses RVO, with size == elements_per_proc, filled with the correct values for
    //  each rank 
    std::vector<float> segment = mpi::scatter(
        std::vector<float>(total, 1.0)   // data being spread out
        [,mpi::root{0}] [, mpi::world],  // optional values
        [,mpi::count{5}] );              // blocks/rank
    float partial_sum = std::reduce(segment.begin(), segment.end(), std::plus)
    std::vector<float> result = mpi::gather(partial_sum); // world_size
    if(rank != 0) // no participating ranks must return a valid container with zero elements
        assert(result.size() == 0);
}

A possible code snipet, to demonstrate smooth interaction between linear algebra systems, HDF5 and MPI:

#include <h5mpi>
#include <h5cpp>
...
int main(int argc, char* argv[]){
  mpi::init(argc, argv);  // registers mpi::finalize at exit
  mpi::comm_t comm;       // defaults to MPI_WORLD
  mpi::info info;         // similar to property lists 
  size_t rank = mpi::rank(comm), size = mpi::size(comm);
  auto fd = h5::open("container.h5", h5::fcpl, h5::mpiio({comm, info}) );
   
  arma::mat A(height, width, ... ), B(...), C(...);
  // reads a block accroding to block cycling distribution or entire dataset
  // into armadillo matrix, making it compatible with PBLAS, no scatter needed
  // instead implicit MPI file IO calls behind the scenes distribute data to each rank
  h5::read(fd, "dataset/A", A, h5::block_cyclic, h5::offset{..}, h5::count{..});
  h5::read(fd, "dataset/B", B, h5::block_cyclic, h5::offset{..}, h5::count{..});
  // parallel matrix matrix multiply, results in local C
  PvGEMM('N', 'T', A.nrows, A.n_cols, REAL, 
    A.memptr(), 0,0, desc_a,
    B.memptr(), 0,0, desc_b, REAL,
    C.memptr(), 0, 0, desc_c);
  // save result into an HDF5 container
  h5::write(fd, "result/C", C, h5::block_cyclic);
}

context

mpi::init
mpi::thread

list of resources, similar to HDF5 `hid_t` descriptors

mpi::comm mpi::comm_t
mpi::window mpi::win_t
mpi::type mpi::tp_t
mpi::request mpi::req_t
mpi::attribute mpi::attr_t
mpi::group mpi::grp_t
mpi::file mpi::fd_t
mpi::info mpi::inf_t

create

mpi::comm_t mpi::create(const mpi::grp_t& group) same as ctor-s
mpi::comm_t mpi::create(const mpi::comm_t& comm, const mpi::grp_t& group)
mpi::comm_t mpi::group(const mpi::comm_t& comm, const mpi::grp_t& group)
mpi::comm_t mpi::split(const mpi::comm_t& comm, int color, int key)
mpi::comm_t mpi::dup(const mpi::comm_t& comm) same as copy ctor
mpi::comm_t mpi::connect(const mpi::comm_t& comm, const std::string& name, int root, const mpi::inf_t& info)

window

mpi::win_t mpi::allocate(const mpi::comm_t& comm, size_t size, size_t disp_unit [, error_handler] [,properties])

list of operators

mpi::req_t mpi::request<T>(mpi::comm& comm, int rank, int tag)
mpi::create | mpi::delete
mpi::dup
mpi::abort( mpi::req_t&, int error_code )
mpi::accumulate( origin, target, op)
mpi::split()
mpi::test
mpi::malloc
mpi::barrier
mpi::broadcast
mpi::send
mpi::receive
mpi::cancel
mpi::accept
mpi::group -- this should be replaced with 'mpi::???'
mpi::join
mpi::rank
mpi::size
mpi::spawn
mpi::split
mpi::dimension
mpi::graph
mpi::adjacent
mpi::neighbours
mpi::scan
mpi::free
mpi::get
mpi::address
mpi::count
mpi::elements
mpi::map
mpi::intersect
mpi::difference
mpi::probe
mpi::receive
mpi::lookup
mpi::publish
mpi::is_finalized
mpi::is_initilized
mpi::is_thread
mpi::gather
mpi::scatter
mpi::flush
mpi::attach
mpi::detach
mpi::lock
mpi::unlock
mpi::post
mpi::tick
mpi::time

property lists

Similarly to HDF5 systems MPI has a way to provide side band information in MPI_info object as key value pairs.

mpi::all
mpi::all_to_all
mpi::any
mpi::some
mpi::no_locks
mpi::exclusive
mpi::inclusive
mpi::non-blocking
mpi::blocking
mpi::buffered
mpi::local
mpi::remote
mpi::sync | mpi::async

steven-varga/mpi