Protoype C++11 MPI benchmark support library inspired by google/benchmark.
An example ping-pong benchmark (bin/pingpong.cpp)
#include "bench/bench.hpp"
#include <mpi.h>
void pingpong(bench::State &state) {
const int rank = bench::world_rank();
const int size = bench::world_size();
const size_t sz = 1;
char *sbuf = new char[sz];
char *rbuf = new char[sz];
for (auto _ : state) {
if (0 == rank) {
MPI_Send(sbuf, sz, MPI_BYTE, 1, 0, MPI_COMM_WORLD);
MPI_Recv(rbuf, sz, MPI_BYTE, 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
} else if (1 == rank) {
MPI_Recv(rbuf, sz, MPI_BYTE, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
MPI_Send(sbuf, sz, MPI_BYTE, 0, 0, MPI_COMM_WORLD);
}
}
state.set_bytes_processed(sz);
delete[] sbuf;
delete[] rbuf;
}
BENCH(pingpong)->timing_root_rank()->no_iter_barrier();
BENCH_MAIN()
The library will automatically determine the number of iterations to run.
Before the pingpong
function is called, the library will call MPI_Barrier(MPI_COMM_WORLD)
.
Then, pingpong
will be called.
Setup code happens before the auto _ : state
loop.
Each iteration of the loop contributes to the total time.
After each iteration, an MPI_Barrier(MPI_COMM_WORLD)
is invoked, it's time does not contribute (see Benchmark::no_iter_barrier()
.
After the loop, benchmark-specific teardown occurs.
timing_root_rank()
says that the reported timing should be tracked just by elapsed time on the root rank.
no_iter_barrier()
says that there should be no MPI_Barrier()
between state iterations.
The reported time the average ns/iteration.
If state.set_bytes_processed
is used, the provided value should be the number of bytes per iteration.
The reported number of bytes will be bytes / second.
Benchmark::timing_max_rank()
: report the maximum time consumed across all ranksBenchmark::timing_root_rank()
: only record time in rank 0Benchmark::no_iter_barrier()
: Do not do anMPI_Barrier()
between iterations.
- Automatic Timing
-
timing_root_rank
-
timing_max_rank
-
timing_wall
: the wall time from the first rank starts to the last rank ends -
timing_aggregate
: aggregate time consumed in each rank
-
- Manual timing
- state.pause_timing()
- state.resume_timing()
- state.set_iteration_time()
- Iteration control
- manual
- automatic
- Support running a benchmark over multiple communicators
- Benchmark must take a communicator
- All pairs of ranks
- Specific pairs of ranks
- CSV reporter
- Add arguments to a benchmark
- Add statistics for repeated runs
- trimean
- standard deviation
- min
- max
- JSON reporter
- Benchmark registration
- static
- Auto-generated main function
- function pointer
- lambda function
- static