Algorithm implementation for MPI_Reduce and MPI_Allreduce?

Question

Algorithm implementation for MPI_Reduce and MPI_Allreduce?

afranques opened this issue 3 years ago · 3 comments

Dear All,

I have been studying the SST-Macro implementation of the MPI_Reduce (reduce.cc) and MPI_Allreduce (allreduce.cc) for a bit (by looking at the code and using default and custom debug flags) and I'm still unsure what kind of algorithm they consist of. Some questions and thoughts:

I got a hint from the code that seems to imply a gather is taking place. Also, the name of the class is WilkeHalvingReduce, which I assume implies there's also a halving process involved. Could it be that it's some sort of reduce-scatter and then gather?
I added some printfs after each SendAction and ReceiveAction and I saw that there is indeed a halving process in which the same rank sends half the number of elements to a different rank at each iteration. However it gets to a point where ranks invoke SendAction and ReceiveAction with 0 elements (nelems=0). Why does it send a message with no elements in it?
In the code there's also a notion of mapping "real" to "virtual" ranks (stored in VirtualRankMap), why do we need such mapping?
I was not able to find the place in the code where the reduction operation is applied; the reduce code takes argument reduce_fxn fxn but I cannot see it being used afterwards.

Thanks to anyone who may shed some light into this.
-Antonio

Answer 1 · 2021-08-04T20:51:40.000Z

@jjwilke can you provide any background?

Answer 2 · 2021-09-01T09:45:35.000Z

Hi, I got same question as above about the implementation of MPI_Reduce, is there any guide or document?

@jjwilke can you provide any background?

Answer 3 · 2021-09-01T16:13:09.000Z

There isn't anyone on the project at this point that has direct knowledge of the algorithm implementations and I don't believe there is any separate documentation.

I studied up on the code a bit and it sure looks like it's based on recursive halving as described in
https://www.mcs.anl.gov/~thakur/papers/ijhpca-coll.pdf
I suspect that paper was used to guide the implementation and it should be pretty useful in understanding the algorithm.