FR: Ignore failed run measurements when computing statistics or put failures into a separate bucket
Opened this issue · 0 comments
Currently, hyperfine seems to either abort when a command fails even once, or it treats failures the same as successes.
I'd like an option for either:
- ignoring the failed runs (report their number, but otherwise compute statistics as though they never happened). Some possible names for such an option:
--omit-failed-runs,--forget-failed-runs, or--skip-failed-runs. This is slightly confusing in the presence of existing--ignore-failed-runs, which I think should be renamed, see #828. - putting the failures in a different "bucket", reporting the statistics for the successful runs and failed runs separately.
Out of scope (as far as I'm concerned), but maybe worth discussing.
There is also a potential feature of bucketing results based on other data, say the exact exit code, or whether it takes more than X seconds, but that's less important to me now.
Another possibility that I'd consider out of scope is automatically finding the buckets, e.g. trying to fit a sum of Gaussian distributions instead of one Gaussian distribution onto the measurements.
My use-case
I am trying to benchmark a test run that includes a test that is flaky and sometimes deadlocks (around 10% of the time). Successful runs take about 3 minutes, unsuccessful ones take forever. So, I do:
hyperfine --warmup 1 --min-runs 10 -- "timeout 5m cargo nextest run"
However, when the test does deadlock in any of the 10 runs, the whole benchmark is wasted.