How to programmatically get the output for timeit() or bench_func()?
Closed this issue · 9 comments
Hi @vstinner , thanks for this PyPerf project. Presumably because of its sophisticated architecture, a command line pyperf timeit -s "...define my_func..." "my_func()"
is able to print a human-readable output to the terminal, such as "Mean +- std dev: 38.3 us +- 2.8 us". And I love that its std dev is noticeably smaller than some other benchmark tools, and its mean is more consistent.
Now, how do I programmatically get that reliable mean value? I tried the following experiments, but could not get what I want.
- Intuitively/pythonically, I thought PyPerf's timeit() would mimic Python's same-name function timeit() to return the time elapsed, preferably a mean, but that is not the case. PyPerf's timeit() returns None.
- The alternative
bench_func()
would return a benchmark object, but the following attempt does not work.
import pyperf
runner = pyperf.Runner()
return_value = runner.timeit("Times a function", stmt="locals()")
print(return_value) # This is always None
benchmark = runner.bench_func("bench_func", locals)
print(benchmark)
if benchmark: # This check is somehow necessary, probably due to the multiprocess architecture
print(benchmark.get_values())
# It is still unclear how to get benchmark.mean()
# It throws exception: statistics.StatisticsError: mean requires at least one data point
BTW, I suspect #165 was for my same use case.
I did more experiment, which brought me further, but still ended up a dead end.
import pyperf
runner = pyperf.Runner() # "Only once instance of Runner must be created. Use the same instance to run all benchmarks."
def timeit(stmt, *args):
"""It will spawn 20+ subprocesses. The main process returns (time, stdev).
Otherwise subprocesses return None. Do NOT run any workload on None code path.
:param Callable stmt: stmt can be a callable.
:param args: Positional arguments for the stmt callable.
"""
name = getattr(stmt, "__name__", str(stmt)) # TODO: The str() could end up with
# different addresses for the same function in different subprocesses, though
benchmark = runner.bench_func(name + str(args), stmt, *args)
if benchmark and benchmark.get_nrun() > 1: # Then all sub-processes finished
# PyPerf will already show the mean and stdev on stdout
return benchmark.median() # Or we could return mean()
# Unfortunately, sub-processes will still expose None results. Caller needs to somehow ignore them.
if __name__ == "__main__":
print("Expensive setup")
result = timeit(globals)
if result:
print(result)
In the snippet above, I can get the time for the test subject (global()
in my case).
But the line "expensive setup" is also being printed 20+ times. This makes it useless in a bigger project that needs expensive setup.
@vstinner, is PyPerf meant to support those programmatic use cases?
I was wondering the same and found a solution.
Note that the pyperf architecture seems to be based on re-spawing the script multiple times for the worker processes. This can be seen by printing sys.argv
in the script, and you will see that the whole script gets executed many times, and the --worker
and --worker-task=<index>
arguments are the way pyperf decided what to do exactly in each invocation.
For this reason I would avoid trying to do anything inside the benchmark script itself, because every side effect (like printing) will be executed many times. Instead I would run the entire benchmark via a subprocess.call
and pass-in -o some_dump_path
. This allows you to then load the written dump file in your main process, which itself isn't subject to re-running.
To illustrate, I'm using something like this in my actual benchmark suite:
def main():
# In my case I have a bunch of benchmark "script snippets" in another folder,
# which contain the actual benchmark code, e.g., some call like:
# pyperf.Runner().bench_time_func(name, func)
bench_files = [p for p in (Path(__file__).parent / "benchmarks").glob("*.py")]
for bench_file in bench_files:
name = bench_file.stem
print(f"Benchmarking: {name}")
dump_path = Path(f"/tmp/bench_results/{name}.json")
dump_path.parent.mkdir(exist_ok=True, parents=True)
dump_path.unlink(missing_ok=True)
subprocess.check_call(
["python", bench_file, "-o", dump_path], cwd=bench_file.parent
)
with dump_path.open() as f:
benchmarks = pyperf.BenchmarkSuite.load(f).get_benchmarks()
# now you can programmatically read the benchmark results here...
Once you have a BenchmarkSuite class, you can use the documented API:
- BenchmarkSuite: https://pyperf.readthedocs.io/en/latest/api.html#benchmarksuite-class
- Benchmark: https://pyperf.readthedocs.io/en/latest/api.html#benchmark-class
Would you mind to elaborate your question?
Examples of code loading JSON files: https://pyperf.readthedocs.io/en/latest/examples.html#hist-scipy-script
Would you mind to elaborate your question?
It seems that PyPerf's multi-process architecture determines that PyPerf's usage pattern is command-line as input, json file as output. This means, if we want to programmatically run multiple test cases and analysis their results, it cannot be done inside a benchmark script. Kudos to @bluenote10 who found a feasible approach to organize such a multi-benchmark project by one main driver script. Overall, this seems difficult to be incorporated into an existing Pytest-powered test suite.
Shameless advertising: I ended up developing perf_baseline
, which is a thin wrapper built on top of Python's timeit
, whose accuracy is adequate. I also added some handy behaviors that I needed for my "perf regression detection" project, and it fits my need well.
I proposed multiple times to have an option to disable fork. Results may be less reliable, but apparently, using fork is causing troubles and pyperf cannot be used in some cases. But so far, nobody really asked for that feature, so it wasn't implemented.
It seems that PyPerf's multi-process architecture determines that PyPerf's usage pattern is command-line as input, json file as output. This means, if we want to programmatically run multiple test cases and analysis their results, it cannot be done inside a benchmark script.
You can write a second script which runs the benchmark suite and analyze them.
Fair enough. Closing this issue, because we have a workaround (and I have an alternative).
As I wrote, I would be fine with an option to not spawn worker processes, but run all benchmarks in a single process.
The main process running all benchmark worker process gets Benchmark objects, it's already part of the API ;-)
I proposed multiple times to have an option to disable fork. Results may be less reliable, but apparently, using fork is causing troubles and pyperf cannot be used in some cases. But so far, nobody really asked for that feature, so it wasn't implemented.
To your point, I suppose we do not need to change PyPerf's multi-process (i.e. fork) nature, especially when that architecture is considered the reason of being more reliable.
What some people needed, at least initially, is an old-school function-style, timeit-like, api, such as output = func(input)
. So, perhaps, PyPerf can provide a higher level api to wrap all those subprocess.call()
and return the content of the json output.
To your point, I suppose we do not need to change PyPerf's multi-process (i.e. fork) nature, especially when that architecture is considered the reason of being more reliable.
In terms of API, maybe pyperf can provide an API which runs a process which is the main process, and this one spawns worker process. The API should just return objects directory. So hide the inner complexity.
But here I'm talking about an API which does everything in the whole process.
I'm not sure if it always matter to spawn worker processes. "It depends" :-) It's the beauty of benchmarking.