Just browsing through code, I noticed this pattern:
|
long offset = 0, next_offset = file_size(current_ckpt.filename(cfg.get("scratch"))); |
|
if (rank > 0) |
|
MPI_Recv(&offset, 1, MPI_LONG, rank - 1, 0, comm, MPI_STATUS_IGNORE); |
|
next_offset += offset; |
|
if (rank + 1 < no_ranks) |
|
MPI_Send(&next_offset, 1, MPI_LONG, rank + 1, 0, comm); |
I suspect you might be able to replace that code segment with an MPI_Scan
or MPI_Exscan
, which takes O(log P) time instead of O(P).