pydata/numexpr

Computing entropy

assaft opened this issue · 2 comments

Hi,

I'm trying to compute the entropy of vectors in a 2d array using
ne.evaluate('sum(where(a>0, a * log(a), 0), axis=1)')

I expect it to be faster than Numpy's alternatives, since it computes the log and performs the multiplication only where the value is positive. In addition, it should avoid intermediate arrays.

When running with no constraint on the number of threads, I do get better performance from numexpr. But when running with a single thread, which is my use case, I get better performance from Numpy.

This is a script for performing time measurements and comparing several alternatives:

import timeit
import numexpr as ne
import numpy as np
from scipy import special


ne.set_num_threads(1)


def simple(arg):
    a, log_a = arg
    return np.sum(a * np.log2(np.where(a > 0, a, 1), out=log_a), axis=1)


def xlogy(arg):
    a, log_a = arg
    a[a < 0] = 0
    return np.sum(special.xlogy(a, a), axis=1) * (1/np.log(2))


def matmul(arg):
    a, log_a = arg
    log_a.fill(0)
    np.log2(a, where=a > 0, out=log_a)
    return (a[:, None, :] @ log_a[..., None]).ravel()


def numexpr(arg):
    a, _ = arg
    return ne.evaluate('sum(where(a>0, a * log(a), 0), axis=1)') * (1/np.log(2))


def setup():
    a = np.random.rand(20, 1000) - 0.1
    log = np.empty_like(a)
    return a, log


setup_code = """
from __main__ import matmul, numexpr, simple, xlogy, setup
data = setup() 
"""
func_code = "numexpr(data)"

print(timeit.timeit(func_code, setup=setup_code, number=100000))

I know this is a late response and you probably somehow solved your issue by now but this might help somebody anyway.

since it computes the log and performs the multiplication only where the value is positive.

This is not true. It computes all values in all cases.

In addition, it should avoid intermediate arrays.

This is true though.

Note that it is possible to make numexpr go slightly faster than numpy in this case, even when single threaded:

First, the reference (numpy) code:

>>> a = np.random.rand(20, 1000) - 0.1
>>> log = np.empty_like(a)

>>> def simple(arg):
...     a, log_a = arg
...     return np.sum(a * np.log2(np.where(a > 0, a, 1), out=log_a), axis=1)

>>> %timeit simple((a, log))
243 µs ± 6.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

And the baseline (your numexpr code):

>>> ne.set_num_threads(1)
8
>>> def numexpr(arg):
...     a, _ = arg
...     return ne.evaluate('sum(where(a>0, a * log(a), 0), axis=1)') * (1/np.log(2))
>>> %timeit numexpr((a, log))
339 µs ± 2.06 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

First, using sum() inside numexpr is not super-efficient, you should rather use plain numpy for the sum:

>>> def numexpr(arg):
...     a, _ = arg
...     return np.sum(ne.evaluate('where(a > 0, a * log(a), 0)'), axis=1) * (1 / np.log(2))
    
>>> %timeit numexpr((a, log))
302 µs ± 1.72 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Then, if you use your original numpy expression, this goes much faster (in part because of the extra "* (1 / np.log(2))" -- but this is faster even with it)

>>> def numexpr(arg):
...     a, log_a = arg
...     tmp = ne.evaluate('a * log(where(a > 0, a, 1))')
...     return np.sum(tmp, axis=1)
    
>>> %timeit numexpr((a, log))
214 µs ± 865 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

If you want the last drop of performance, you can precompile the expr and use the out argument (also exist for ne.evaluate) to avoid the memory allocation:

>>> expr = ne.NumExpr('a * log(where(a > 0, a, 1))')

>>> def numexpr(arg):
...     a, log_a = arg
...     expr(a, out=log_a, ex_uses_vml=False)
...     return np.sum(log_a, axis=1)

>>> %timeit numexpr((a, log))
204 µs ± 2.54 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Message to comment on stale issues. If none provided, will not mark issues stale