Find a replacement for pyperf
jaraco opened this issue · 5 comments
In #294, I introduced pyperf, mainly for its ability to run timeit
, record the results in a file, and then compare the results from another run. pyperf
has some really nice features:
- Performance benchmarks build on
timeit
, so anything one can run in an interpreter is fair game for evaluation. - Appends results to a file allowing for multiple independent tests to be run.
- Ability to give each benchmark a meaningful name.
- Reporting tool automatically excludes insignificant variance (highlights significant variance).
Unfortunately, it also has some drawbacks:
- In psf/pyperf#106, I describe an issue where the measurements are jittery. I haven't had the time to investigate the issue, but given that the raw (minimum) timeit values were an effective measurement of peak performance, I'd like something that provide similar stability.
- pyperf still requires orchestration (such as the two tox environments that need to be run in order). Ideally, one would be able to declare the tests in a list and some tooling would orchestrate the setup, execution, comparison, and reporting.
- pyperf has no pytest integration. Ideally, the tests could be run through a pytest plugin and thus gain the benefits of selection or exclusion (
-k perf
or-k 'not perf'
) and other advantages of integration.
From the above, you can infer my wish list for a performance testing framework for this and other Python projects.
In https://github.com/jaraco/pytest-perf, I've started work on a library to achieve the above. As of this initial 0.1 release, it includes a BenchmarkRunner that utilizes pip-run
to create a baseline measurement for a given command to compare against a local measurement.
In working on pytest-perf, I'm now at a stage where I wish to parse the outputs from timeit
, but best I can tell, there's no good parser for timedeltas, so I'm working on enhancing the parser in tempora. One big complication is that timeit
returns results with nsec resolution, but Python's datetime.timedelta class only has microsecond resolution. I looked for a suitable container with nanosecond resolution, but didn't find one without importing pandas and numpy. I considered implementing a whole new timedelta object that honors nanoseconds, but I'm not sure I want to go down that road just for pytest-perf. I'm thinking that microsecond resolution is probably adequate, but I still want to account for nanoseconds when they round to a full microsecond.
In the pytest-perf branch, I've started work on adopting pytest-perf to perform the tests, but there are a few problems:
- Something about including pytest-perf is causing a few tests to fail (perhaps causing excess distributions to be discovered).
- The tests rely on 'extras' in the install (causing the whole test suite to include
ipython
and its dependencies). - The baseline (control) has to clone code from the remote, even though the code is available in the local git checkout.
I feel generally uneasy about the interaction of the performance tests with other tests... I'll need to do some more exploration before it's viable.
You may possibly want to have a look at https://asv.readthedocs.io/en/stable/. I realize that it has its own rather peculiar approach to writing benchmarks, but the summaries (e.g. https://pv.github.io/numpy-bench/) are pretty nice...
That seems non-trivial to plug into this project due to the lack of pytest integration.