Algorithm implementation for MPI_Reduce and MPI_Allreduce?
afranques opened this issue · 3 comments
Dear All,
I have been studying the SST-Macro implementation of the MPI_Reduce
(reduce.cc
) and MPI_Allreduce
(allreduce.cc
) for a bit (by looking at the code and using default and custom debug flags) and I'm still unsure what kind of algorithm they consist of. Some questions and thoughts:
- I got a hint from the code that seems to imply a gather is taking place. Also, the name of the class is
WilkeHalvingReduce
, which I assume implies there's also a halving process involved. Could it be that it's some sort of reduce-scatter and then gather? - I added some
printf
s after eachSendAction
andReceiveAction
and I saw that there is indeed a halving process in which the same rank sends half the number of elements to a different rank at each iteration. However it gets to a point where ranks invokeSendAction
andReceiveAction
with 0 elements (nelems=0
). Why does it send a message with no elements in it? - In the code there's also a notion of mapping "real" to "virtual" ranks (stored in
VirtualRankMap
), why do we need such mapping? - I was not able to find the place in the code where the reduction operation is applied;
the reduce code
takes argumentreduce_fxn fxn
but I cannot see it being used afterwards.
Thanks to anyone who may shed some light into this.
-Antonio
Hi, I got same question as above about the implementation of MPI_Reduce, is there any guide or document?
@jjwilke can you provide any background?
There isn't anyone on the project at this point that has direct knowledge of the algorithm implementations and I don't believe there is any separate documentation.
I studied up on the code a bit and it sure looks like it's based on recursive halving as described in
https://www.mcs.anl.gov/~thakur/papers/ijhpca-coll.pdf
I suspect that paper was used to guide the implementation and it should be pretty useful in understanding the algorithm.