nschloe/perfplot

[REQUEST] Support for Replicates

ELC opened this issue · 4 comments

ELC commented

Consider posting in https://github.com/nschloe/perfplot/discussions for feedback before raising a feature request.

How would you improve perfplot?

Add support for replicates, running the same kernel multiple times and returning an aggregated estimator (mean, median, etc).

What problem does it solved for you?

Usually when running perfplot, there are other programs running on the same system, that may bias one or more kernel executions. To fix this, several runs could be performed and then aggregated to provide a more accurate representation of the execution time.

This is inline and compatible as well with plots such as seaborn.lineplot that with such dataframe will be able to produce confidence intervals.

If the kernels differ by order of magnitude, this may seem excesive but if they are close enough, a higher resolution provided by multiple runs could be highly beneficial.

At the moment I am achieving this by using a for-loop and perfplot.bench, however it would be nice to have a replicates and estimator parameters to bench so that this process happens seamlessly.

My code at the moment looks like this (no aggregation here):

def run_benchmark(setup, kernels, n_range, replicates):
    perfplot_params = dict(
        setup=setup,
        kernels=kernels,
        labels=["Pure Python", "Numpy"],
        n_range=n_range,
        xlabel="Array Size",
    )

    data_dicts = []

    for _ in range(replicates):
        out = perfplot.bench(**perfplot_params)

        for t, label in zip(out.timings_s, out.labels):
            data = {"x": out.n_range, "values": t, "label": [label] * len(out.n_range)}
            data_dicts.append(data)

    return pd.DataFrame(data_dicts).explode(["x", "values", "label"])

running the same kernel multiple times and returning an aggregated estimator (mean, median, etc).

perfplot already runs the same kernel multiple times. As you correctly point out, multiple runs will have different run times because of the differing background machine load. The smallest runtime is the fastest that your computer could run the task. Everything that's slower just measures how much more the background tasks affected the computation. That's usually not something you're interested in. That's why serious performance tools always return the minimum runtime. Everything else, including mean and median, only tell you something about all tasks except the one you're interested in.

ELC commented

@nschloe That's a sensible explanation, do you have some literature to continue reading on the topic? I would like to have something quotable for the "keep the minimum instead of mean/media"

No idea where this is written down. To me, it seems basic enough to be stated without citation.