/bench

Protoype C++11 MPI benchmark support library inspired by google/benchmark.

Primary LanguageC++

bench

Protoype C++11 MPI benchmark support library inspired by google/benchmark.

Benchmark Loop

An example ping-pong benchmark (bin/pingpong.cpp)

#include "bench/bench.hpp"

#include <mpi.h>

void pingpong(bench::State &state) {

  const int rank = bench::world_rank();
  const int size = bench::world_size();

  const size_t sz = 1;

  char *sbuf = new char[sz];
  char *rbuf = new char[sz];

  for (auto _ : state) {
    if (0 == rank) {
      MPI_Send(sbuf, sz, MPI_BYTE, 1, 0, MPI_COMM_WORLD);
      MPI_Recv(rbuf, sz, MPI_BYTE, 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
    } else if (1 == rank) {
      MPI_Recv(rbuf, sz, MPI_BYTE, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
      MPI_Send(sbuf, sz, MPI_BYTE, 0, 0, MPI_COMM_WORLD);
    }
  }

  state.set_bytes_processed(sz);
  delete[] sbuf;
  delete[] rbuf;
}

BENCH(pingpong)->timing_root_rank()->no_iter_barrier();
BENCH_MAIN()

The library will automatically determine the number of iterations to run.

Before the pingpong function is called, the library will call MPI_Barrier(MPI_COMM_WORLD). Then, pingpong will be called. Setup code happens before the auto _ : state loop. Each iteration of the loop contributes to the total time. After each iteration, an MPI_Barrier(MPI_COMM_WORLD) is invoked, it's time does not contribute (see Benchmark::no_iter_barrier(). After the loop, benchmark-specific teardown occurs. timing_root_rank() says that the reported timing should be tracked just by elapsed time on the root rank. no_iter_barrier() says that there should be no MPI_Barrier() between state iterations.

Reporting

The reported time the average ns/iteration. If state.set_bytes_processed is used, the provided value should be the number of bytes per iteration. The reported number of bytes will be bytes / second.

  • Benchmark::timing_max_rank(): report the maximum time consumed across all ranks
  • Benchmark::timing_root_rank(): only record time in rank 0
  • Benchmark::no_iter_barrier(): Do not do an MPI_Barrier() between iterations.

Roadmap

  • Automatic Timing
    • timing_root_rank
    • timing_max_rank
    • timing_wall: the wall time from the first rank starts to the last rank ends
    • timing_aggregate: aggregate time consumed in each rank
  • Manual timing
    • state.pause_timing()
    • state.resume_timing()
    • state.set_iteration_time()
  • Iteration control
    • manual
    • automatic
  • Support running a benchmark over multiple communicators
    • Benchmark must take a communicator
    • All pairs of ranks
    • Specific pairs of ranks
  • CSV reporter
  • Add arguments to a benchmark
  • Add statistics for repeated runs
    • trimean
    • standard deviation
    • min
    • max
  • JSON reporter
  • Benchmark registration
    • static
      • Auto-generated main function
    • function pointer
    • lambda function