`perftester`: Lightweight performance testing of Python functions

Installation

Install using pip:

pip install perftester

The package has three external dependencies: memory_profiler (repo), easycheck (repo), and rounder (repo).

perftester is still under heavy testing. If you find anything that does not work as intended, please let me know via nyggus <at> gmail.com.

Pre-introduction: TL;DR

At the most basic level, using perftester is simple. It offers you two functions for benchmarking (one for execution time and one for memory), and two functions for performance testing (likewise). Read below for a very short introduction of them. If you want to learn more, however, do not stop there, but read on.

Benchmarking

You have time_benchmark() and memory_benchmark() functions:

import perftester as pt
def foo(x, n): return [x] * n
pt.time_benchmark(foo, x=129, n=100)

and this will print the results of the time benchmark, with raw results similar to those that timeit.repeat() returns, but unlike it, pt.time_benchmark() returns mean raw time per function run, not overall; in additional, you will see some summaries of the results.

The above call did actually run timeit.repeat() function, with the default configuration of Number=100_000 and Repeat=5. If you want to change any of these, you can use arguments Number and Repeat, correspondigly:

pt.time_benchmark(foo, x=129, n=100, Number=1000)
pt.time_benchmark(foo, x=129, n=100, Repeat=2)
pt.time_benchmark(foo, x=129, n=100, Number=1000, Repeat=2)

These calls do not change the default settings so you use the arguments' values on the fly. Later you will learn how to change the default settings and the settings for a particular function.

Some of you may wonder why the Number and Repeat arguments violate what we can call the Pythonic style, by using a capital first letter for function arguments. The reason is simple: I wanted to minimize a risk of conflicts that would happen when benchmarking (or testing) a function with any of the arguments Number or Repeat (or both). A chance that a Python function will have a Number or a Repeat argument is rather small. If that happens, however, you can use functools.partial() to overcome the problem:

from functools import partial

def bar(Number, Repeat): return [Number] * Repeat
bar_p = partial(bar, Number=129, Repeat=100)
pt.time_benchmark(bar_p, Number=100, Repeat=2)

Benchmarking RAM usage is similar:

pt.memory_usage_benchmark(foo, x=129, n=100)

It uses the memory_profiler.memory_usage() function, which runs the function just once to measure its memory use. Almost always, there is no need to repeat it, unless there is great randomness in memory usage by the function. If you have good reasons to change this behavior (e.g., in the case of such randomness), you can request several calls of the function, using the Repeat argument:

pt.memory_usage_benchmark(foo, x=129, n=100, Repeat=100)

You can learn more in the detailed description of the package below.

Testing

The API of perftester testinf functions is similar to that of benchmarking functions, the only difference being limits you need to provide. You can determine those limits using the above benchmark functions. Here are examples of several performance tests using perftester:

>>> import perftester as pt
>>> def foo(x, n): return [x] * n

# A raw test
>>> pt.time_test(foo, raw_limit=9.e-07, x=129, n=100)

# A relative test
>>> pt.time_test(foo, relative_limit=7, x=129, n=100)

# A raw test
>>> pt.memory_usage_test(foo, raw_limit=25, x=129, n=100)

# A relative test
>>> pt.memory_usage_test(foo, relative_limit=1.2, x=129, n=100)

You can, certainly, use Repeat and Number:

>>> pt.time_test(foo, relative_limit=7, x=129, n=100, Repeat=3, Number=1000)

Raw tests work with raw executation time. Relative tests work with relative time against a call of an empty function; that way, the test should be more or less independent of the machine you run the test on; so, a quick machine should provide more or less similar relative results as a slow machine.

Relative results, however, can differ between different operating systems.

You can use these testing functions in pytest, or in dedicated doctest files. You can, however, use perftester as a separate performance testing framework. Read on to learn more about that. What's more, perftester offers more functionalities, and a config object that offers you much more control of testing.

That's all in this short introduction. If you're interested in more advanced use of perftester, read on to read a far more detailed introduction. In addition, files in the docs folder explain in detail particular functionalities that perftester offers.

Introduction

perftester is a lightweight package for simple performance testing in Python. Here, performance refers to execution time and memory usage, so performance testing means testing if a function performs quickly enough and does not use too much RAM. In addition, the module offers you simple functions for straightforward benchmarking, in terms of both execution time and memory.

Under the hood, perftester is a wrapper around two functions from other modules:

perftester.time_benchmark() and perftester.time_test() use timeit.repeat()
perftester.memory_usage_benchmark() and perftester.memory_usage_test() use memory_profiler.memory_usage()

What perftester offers is a testing framework with as simple syntax as possible.

You can use perftester in three main ways:

in an interactive session, for simple benchmarking of functions;
as part of another testing framework, like doctest or pytests; and
as an independent testing framework.

The first way is a different type of use from the other two. I use it to learn the behavior of functions (interms of execution time and memory use) I am working on right now, so not for actual testing.

When it comes to actual testing, it's difficult to say which of the last two ways is better or more convinient: it may depend on how many performance tests you have, and how much time they take. If the tests do not take more than a couple of seconds, then you can combine them with unit tests. But if they take much time, you should likely make them independent of unit tests, and run them from time to time.

Using `perftester`

Use it as a separate testing framework

To use perftester that way,

Collect tests in Python modules whose names start with "perftester_"; for instance, "perftester_module1.py", perftester_module2.py" and the like.
Inside these modules, collect testing functions that start with "perftester_"; for instance, def perftester_func_1(), def perftester_func_2(), and the like (note how similar this approach is to that which pytest uses);
You can create a config_perftester.py file, in which you can change any configuration you want, using the perftester.config object. The file should be located in the folder from which you will run the CLI command perftester. If this file is not there, perftester will use its default configuration. Note that cofig_perftester.py is a Python module,so perftester configuration is done in actual Python code.
Now you can run performance tests using perftester in your shell. You can do it in three ways:
- $ perftester recursively collects all perftester modules from the directory in which the command was run, and from all its subdirectories; then it runs all the collected perftester tests;
- $ perftester path_to_dir recursively collects all perftester modules from path_to_dir/ and runs all perftesters located in them.
- $ perftester path_to_file.py runs all perftesters from the module given in the path.

Read more about using perftester that way here.

It does make a difference how you do that. When you run the perftester command with each testing file independently, each file will be tested in a separated session, so with a new instance of the pt.config object. When you run the command for a directory, all the functions will be tested in one session. And when you run a bare perftester command, all your tests will be run in one session.

There is no best approach, but remember to choose one that suits your needs.

Use `perftester` inside `pytest`

This is a very simple approach, perhaps the simplest one: When you use pytest, you can simply add perftester testing functions to pytest testing functions, and that way both frameworks will be combined, or rather the pytest framework will run perftester tests. The amount of additional work is minimal.

For instance, you can write the following test function:

import perftester as pt
from my_module import f1 # assume that f1 takes two arguments, a string (x) and a float (y)
def test_performance_of_f1():
    pt.time_test(
      f1,
      raw_limit=10, relative_limit=None,
      x="whatever string", y=10.002)

This will use either the settings for this particular function (if you set them in pt.config) or the default settings (also from pt.config). However, you can also use Number and Repeat arguments, in order to overwrite these settings (passed to timeit.repeat() as number and repeat, respectively) for this particular function call:

import perftester as pt
from my_module import f1 # assume that f1 takes two arguments, a string (x) and a float (y)
def test_performance_of_f1():
    pt.time_test(
      f1,
      raw_limit=10, relative_limit=None,
      x="whatever string", y=10.002
      Number=1_000_000, Repeat=20)

If you now run pytest and the test passes, nothing will happen — just like with a regular pytest test. If the test fails, however, a perftester.TimeTestError exception will be thrown, with some additional information.

perftester's default behavior is to significantly shorten traceback, but only during testing (so when you run pt.time_test() and pt.memory_usage_test()). You can extend this behavior to other situations, with just one command: pt.config.cut_traceback(); to reverse, use pt.config.full_traceback() — but do remember that this will not mean the full traceback will be used during perftesting.

This is the easiest way to use perftester. Its only drawback is that if the performance tests take much time, pytest will also take much time, something usually to be avoided. You can then do some pytest tricks to not run perftester tests, and run them only when you want — or you can simply use the above-described command-line perftester framework for performance testing.

Use `perftester` inside `doctest`

In the same way, you can use perftester in doctest. You will find plenty of examples in the documentation here, and in the tests/ folder.

A great fan of doctesting, I do not recommend using perftester in docstrings. For me, doctests in docstrings should clarify things and explain how functions work, and adding a performance test to a function's docstring would decrease readability.

The best way, thus, is to write performance tests as separate doctest files, dedicated to performance testing. You can collect such files in a shell script that runs performance tests.

Basic use of `perftester`

Simple benchmarking

To create a performance test for a function, you likely need to know how it behaves. You can run two simple benchmarking functions, pt.memory_usage_benchmark() and pt.time_benchmark(), which will run time and memory benchmarks, respectively. First, we will decrease number (passed to timeit.repeat), in order to shorten the benchmarks (which here serve as doctests):

>>> import perftester as pt
>>> def f(n): return sum(map(lambda i: i**0.5, range(n)))
>>> pt.config.set(f, "time", Number=1000)
>>> b_100_time = pt.time_benchmark(f, n=100)
>>> b_100_memory = pt.memory_usage_benchmark(f, n=100)
>>> b_1000_time = pt.time_benchmark(f, n=1000)
>>> b_1000_memory = pt.memory_usage_benchmark(f, n=1000)

Remember also about the possibility of overwriting (for this single benchmark) the settings from pt.config.settings, which you can do using Number (only for time testing) and Repeat (for both): pt.time_benchmark(f, n=100, Number=1_000_000, Repeat=20) and pt.memory_usage_benchmark(f, n=1000, Repeat=10).

And this is it. You can use pt.pp() function to pretty-print the results. In my machine, I got the following results (here, for b_100):

# pt.pp(b_100_time)
{'max': 16.66,
'max_relative': 1.004,
'max_result_per_run': [16.66],
'max_result_per_run_relative': [1.004],
'mean': 16.66,
'mean_result_per_run': [16.66],
'raw_results': [[16.66, 16.66, 16.66]],
'relative_results': [[1.004, 1.004, 1.004]]}

# pt.pp(b_100_memory)
{'max': 1.389e-05,
'mean': 1.303e-05,
'min': 1.168e-05,
'min_relative': 129.5,
'raw_times': [1.168e-05, 1.263e-05, 1.349e-05, 1.346e-05, 1.389e-05],
'raw_times_relative': [129.5, 140.0, 149.5, 149.2, 154.0]}

For memory testing, the main result is max while for time testing, it is min. For relative testing, we would look at max_relative and min_relative, respectively.

Surely, we should expect that the function with n=100 be quicker than with n=1000:

>>> b_100_time["min"] < b_1000_time["min"]
True

but memory use will be more or less the same:

>>> import math
>>> math.isclose(b_100_memory["max"], b_1000_memory["max"], rel_tol=.01)
True

Time testing

For time tests, we have the pt.time_test() function. First, a raw time test:

>>> pt.time_test(f, raw_limit=2e-05, n=100)

raw_limit, relative_limit, Number and Repeat are keyword-only arguments.

Like before, we can use Number and Repeat arguments:

>>> pt.time_test(func=f, raw_limit=3e-05, n=100, Number=10)

Now, let's define a relative time test:

>>> pt.time_test(f, relative_limit=230, n=100)

We also can combine both:

>>> pt.time_test(f, raw_limit=2e-05, relative_limit=230, n=100)

You can read about relative testing below, in section.

Memory testing

Memory tests use pt.memory_usage_test() function, which is used in the same way as pt.time_test():

>>> pt.memory_usage_test(f, raw_limit=27, n=100) # test on raw memory
>>> pt.memory_usage_test(f, relative_limit=1.2, n=100) # relative time test
>>> pt.memory_usage_test(f, raw_limit=27, relative_limit=1.2, n=100) # both

In a memory usage test, a function is called only once. You can change that — but do that only if you have solid reasons — using, for example, pt.config.set(f, "time", "repeat", 2), which will set this setting for the function in the configuration (so it will be used for all next calls for function f()). You can also do it just once (so, without saving the setting in pt.config.settings), using the Repeat argument:

>>> pt.memory_usage_test(f, raw_limit=27, relative_limit=1.2, n=100, Repeat=100)

(There is little sence in repeating this particular function, as you will get almost the same results in each repetition.)

Of course, memory tests do not have to be very useful for functions that do not have to allocate too much memory, but as you will see in other documentation files in perftester, some function do use a lot of memory, and such tests do make quite a lot sense for them.

Configuration: `pt.config`

The whole configuration is stored in the pt.config object, which you can easily change. Here's a short example of how you can use it:

>>> def f(n): return list(range(n))
>>> pt.config.set(f, "time", Number=10_000, Repeat=1)

but you can change much more using it. You can read in detail about using pt.config here.

When you use perftester as a command-line tool, you can modify pt.config in the settings_perftester.py module, for instance:

# settings_perftester.py
import perftester as pt

# shorten the tests
pt.config.set_defaults("time", Number=10_000, Repeat=3) 

# log the results to file (they will be printed in the console anyway)
pt.config.log_to_file = True
pt.config.log_file = "./perftester.log"

# increase the digits for printing floating numbers
pt.config.digits_for_printing = 7

# Use regular traceback
pt.config.full_traceback()

and so on. You can also change settings in each testing file itself, preferably in perftester_ functions.

When you use perftester in an interactive session, you update pt.config in a normal way, in the session. And when you use perftester inside pytest, you can do it in conftest.py and in each testing function.

Output

If a test fails, you will see something like this:

# for time test
TimeTestError in perftester_for_testing.perftester_f
Time test not passed for function f:
raw_limit = 0.011
minimum run time = 0.1007

# for memory test
MemoryTestError in perftester_for_testing.perftester_f2_time_and_memory
Memory test not passed for function f2:
memory_limit = 20
maximum memory usage = 20.04

Let's analyze what we see in this output:

Whether it's an error from a time test (TimeTestError) or a memory test (MemoryTestError).
perftester_for_testing.perftester_f provides the testing module (perftester_for_testing) and the perftester function (perftester_f2_time_and_memory).
Memory test not passed for function f2:: Here you see for which tested (not perftester_) function the test failed (here, f2()).
raw_limit and memory_limit: these are the raw limits you provided; these could be also relative_limit and relative_memory_limit, for relative tests.
minimum run time and maximum memory usage are the actual results from testing, and they were too high (higher than the limits set inside the testing function).

You can locate where a particular test failed, using the module, perftester_ function, and the tested function. If a perftester_ function combines more tests, then you can find the failed test using the limits.

Like in pytest, a recommended approach is to use one performance test per perftester_ function. This can save you some time and trouble, but also this will ensure that all tests will be run.

Summary output

At the end, you will see a simple summary of the results, something like this:

Out of 8 tests, 5 has passed and 3 has failed.

Passed tests:
perftester_for_testing.perftester_f2
perftester_for_testing.perftester_f2_2
perftester_for_testing.perftester_f2_3
perftester_for_testing.perftester_f3
perftester_for_testing_2.perftester_f

Failed tests:
perftester_for_testing.perftester_f
perftester_for_testing.perftester_f2_time_and_memory
perftester_for_testing.perftester_f_2

Relative tests against another function

In the basic use, when you choose a relative benchmark, you compare the performance of your function with that of a built-in (empty) function pt.config.benchmark_function(). In most cases, this is what you need. Sometimes, however, you may wish to benchmark against another function. For instance, you may want to build your own function that does the same thing as a Python built-in function, and you want to test (and show) that your function performs better. There are two ways of achieving this:

you can use a simple trick; see here;
you can overwrite the built-in benchmark functions; see here.

Raw and relative performance testing

Surely, any performance tests are strongly environment-dependent, so you need to remember that when writing and conducting any performance tests. perftester, however, offers a solution to this: You can define tests based on

raw values: raw execution time and raw memory usage, and
relative values: relative execution time and relative memory usage

Above, relative means benchmarking against a built-in (into perftester) simple function, which is actually an empty function (so it represents the overhead of running a function). Thus, you can, for instance, test whether your function is two times slower than this function. The benchmarking function itself does not matter, as it is just a benchmark. What matters is that, usually, your function should relatively to this benchmarking function behave the same way between different machines. So, if it works two times slower than the benchmarking function on your machine, then it should work in a similar way on another machine, even if this other machine is much faster than yours. Of course, this assumes linearity (so, two times slower here means two times slower everywhere), which does not have to be always true. Anyway, such tests will almost always be more representative, and more precise, than those based on raw times.

This does not mean, however, that raw tests are useless. In fact, in a production environment, you may wish to use raw tests. Imagine a client expects that an app never takes longer than an hour to perform a particular task (note that this strongly depends on what other processes are run in the production environment). You can create an automated test for that using perftester, in a very simple way - just several lines of code.

You can of course combine both types of tests, and you can do it in a very simple way. Then, the test is run once, but the results are checked with raw limits and relative limits.

Warning! Relative results can be different between operating systems.

Other tools

Of course, Python comes with various powerful tools for profiling, benchmarking and testing. Here are some of them:

cProfile and profile, the built-in powerful tools for deterministic profiling
the built-in timeit module, for benchmarking
memory_profiler, a powerful memory profiler (memory_profiler is utilized by perftester)

In fact, perftester is just a simple wrapper around timeit and memory_profiler, since perftester itself does not come with its own solutions. It simply uses these functions and offers an easy-to-use API to benchmark and test memory and time performance.

Manipulating the traceback

The default behavior of perftester is to not include the full traceback when a test does not pass. This is because when running performance tests, you're not interested in finding bugs, and this is what traceback is for. Instead, you want to see which test did not pass and how.

This behavior will not affect any other function than the two perftester testing functions: pt.time_test() and pt.memory_usage_test(). If you want to use this behavior for other functions, too, you can use pt.config.cut_traceback(); to reverse, use pt.config.full_traceback().

Tracing full memory usage → Moved to `tracemem`

Since the 0.5.* versions, perftester contained a beta version of a memory tracer that could be used to trace full memory usage of a Python session.

Since perftester requires some memory to load, it over-measured session memory. In order to avoid this, this feature was moved to a separate Python package, called tracemem. You can install it from PyPi, and you will find its Git repository here.

Caveats

perftester does not work with multiple threads or processes.
perftester is still in a beta version and so is still under testing.
Watch out when you're running the same test in different operating systems. Even relative tests can differ from OS to OS.

Operating systems

The package is developed in Linux (actually, under WSL) and checked in Windows 10, so it works in both these environments.

Support

Any contribution will be welcome. You can submit an issue in the repository. You can also create your own pull requests.

nyggus/perftester

perftester: Lightweight performance testing of Python functions