NVIDIA/nvbench

[FEA] Add `nvbench::exec_tag::host` to support CPU-only benchmarks

Opened this issue · 2 comments

Nvbench currently does not support benchmarking CPU-only code natively. Although adding nvbench::exec_tag::sync gives plausible measurements for cold runs, there is no mechanism for batch measurements. We could enable this feature by e.g. adding a distinct exec tag nvbench::exec_tag::host.

Note that using exec_tag::sync isn't really reliable for CPU-only benchmarks because it still uses CUDA events for timing. This works, but it's a little hacky.

The main things a exec_tag::host would need to do:

  • Use CPU timers (std::chrono) instead of CUDA events
  • Report hardware metrics relative to CPU stats (e.g., bandwidth utilization)

Can this one get a bump given Grace is a common use case now?