[FEA] Add `nvbench::exec_tag::host` to support CPU-only benchmarks
Opened this issue · 2 comments
sleeepyjack commented
Nvbench currently does not support benchmarking CPU-only code natively. Although adding nvbench::exec_tag::sync
gives plausible measurements for cold runs, there is no mechanism for batch measurements. We could enable this feature by e.g. adding a distinct exec tag nvbench::exec_tag::host
.
jrhemstad commented
Note that using exec_tag::sync
isn't really reliable for CPU-only benchmarks because it still uses CUDA events for timing. This works, but it's a little hacky.
The main things a exec_tag::host
would need to do:
- Use CPU timers (
std::chrono
) instead of CUDA events - Report hardware metrics relative to CPU stats (e.g., bandwidth utilization)
cliffburdick commented
Can this one get a bump given Grace is a common use case now?