GlobalArrays/ga

Feature request: enable launching nwchem with other code in MPMD code

Closed this issue · 3 comments

Request
I would like to run nwchem with another code together in MPMD mode but currently the GA initialization uses MPI_COMM_WORLD (to duplicate it). Instead, could you use MPI_Comm_split() to make the global communicator for the nwchem processes only?

This happens in src-mpi/comex.c:comex_init() and src-mpi-pr/group.c:comex_group_init()

Suggested code: replace the MPI_Comm_dup() call with:

    int wrank;
    MPI_Comm_rank(MPI_COMM_WORLD, &wrank);
    const unsigned int color = 'n'+'w'+'c'+'h'+'e'+'m'; //642;
    MPI_Comm_split(MPI_COMM_WORLD, color, wrank, &(l_state.world_comm));
    assert(MPI_SUCCESS == status);
    assert(l_state.world_comm);

in the src-mpi-pr/ code replace l_state.world_comm with g_state.comm

Reasoning
I am trying to use ADIOS for outputting trajectories and connect it to another code for processing the data. ADIOS has some engines that use MPI to transfer data directly but that requires running the two codes in MPMD mode.

Thank you

@ebylaska requested something similar and I implemented it in ARMCI-MPI (pmodels/armci-mpi#26).

I know how to implement it in ComEx but haven't had time to do it. But I think the right way to do it is add a separate initialization routine that takes a communicator argument explicitly, as opposed to what you propose.

Also, if you want this to work with NWChem, it needs to be implemented in TCGMSG and rolled all the way up into the NWChem program driver, which is a bigger change than just GA/ARMCI.

We added an option to initialize GA with an externally supplied communicator. It is in the develop branch and will be part of the next release.