FR: Ignore failed run measurements when computing statistics or put failures into a separate bucket

Question

FR: Ignore failed run measurements when computing statistics or put failures into a separate bucket

Opened this issue 2 months ago · 0 comments

Currently, hyperfine seems to either abort when a command fails even once, or it treats failures the same as successes.

I'd like an option for either:

ignoring the failed runs (report their number, but otherwise compute statistics as though they never happened). Some possible names for such an option: --omit-failed-runs, --forget-failed-runs, or --skip-failed-runs. This is slightly confusing in the presence of existing --ignore-failed-runs, which I think should be renamed, see #828.
putting the failures in a different "bucket", reporting the statistics for the successful runs and failed runs separately.

Out of scope (as far as I'm concerned), but maybe worth discussing.

There is also a potential feature of bucketing results based on other data, say the exact exit code, or whether it takes more than X seconds, but that's less important to me now.

Another possibility that I'd consider out of scope is automatically finding the buckets, e.g. trying to fit a sum of Gaussian distributions instead of one Gaussian distribution onto the measurements.

My use-case

I am trying to benchmark a test run that includes a test that is flaky and sometimes deadlocks (around 10% of the time). Successful runs take about 3 minutes, unsuccessful ones take forever. So, I do:

hyperfine --warmup 1 --min-runs 10 -- "timeout 5m cargo nextest run"

However, when the test does deadlock in any of the 10 runs, the whole benchmark is wasted.