psf/pyperf

Instruction counts instead of wall clock time?

gsnedders opened this issue · 4 comments

It would be interesting to investigate use of instruction counts (through Linux's perf module and similar tools on other platforms to access hardware performance counters) within pyperf.

See, for example, Nicholas Nethercote's experience with monitoring rustc performance:

Contrary to what you might expect, instruction counts have proven much better than wall times when it comes to detecting performance changes on CI, because instruction counts are much less variable than wall times (e.g. ±0.1% vs ±3%; the former is highly useful, the latter is barely useful). Using instruction counts to compare the performance of two entirely different programs (e.g. GCC vs clang) would be foolish, but it’s reasonable to use them to compare the performance of two almost-identical programs (e.g. rustc before PR #12345 and rustc after PR #12345). It’s rare for instruction count changes to not match wall time changes in that situation. If the parallel version of the rustc front-end ever becomes the default, it will be interesting to see if instruction counts continue to be effective in this manner.

Perhaps in an interpreter where dispatch overhead and boxing/unboxing cost can be significant this won't hold true due to small changes having the potential to cause to a much more significant change in cache misses, but it would still be worthwhile to investigate in my view.

pyperf internally stores numbers and an unit. Some part of the code ignore the unit and hardcodes seconds, but this should be fixed.

You can already switch from seconds (time) to bytes (memory footprint).

I would be fine with adding an option to measure the instruction count. But I don't know how to implement that :-) There are sometimes discussions about measuring "CPU time" rather than "wall clock time". I would be ok to have an option to use a different clock, but it should be store in the JSON to be able to distinguish benchmark results mesuring "wall clock time" than the ones measuring "CPU times". Maybe "cputime" can be used as the unit?

Nowadays, the number of instruction executed per CPU cycle is not a constant and depends on the code placement, cache efficiency, various timing, and so I personally prefer wall clock time. I designed pyperf to give users an idea of the performance that they will see on their machine. Not the performance on a server dedicated for benchmarks.

Well, in practice, pyperf system tune disables Turbo Boost whereas applications using a single CPU can run faster. But the important part for me is not the absolute value, but the ratio when comparing performances of a change to a reference point.

I personally prefer wall clock time.

Outputting instruction counts doesn't prevent you using wall clock timings.

Outputting instruction counts doesn't prevent you using wall clock timings.

If someone proposes a PR, I will review it and likely merge it ;-) As I wrote, pyperf design allows to store any number with an unit, integer or floating point number.