find_index and numpy.searchsorted
giumas opened this issue · 2 comments
The function find_index
(https://github.com/AllenDowney/ThinkDSP/blob/master/code/thinkdsp.py#L143) does the same as numpy.searchsorted
(http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.searchsorted.html) but in a less efficient way.
I cannot find a reason to maintain find_index
. Do you?
find_index is constant time; searchsorted is logarithmic. Big enough
difference to be worth maintaining? Maybe not.
On Fri, Feb 5, 2016 at 10:01 AM, giumas notifications@github.com wrote:
The function find_index (
https://github.com/AllenDowney/ThinkDSP/blob/master/code/thinkdsp.py#L143)
does the same as numpy.searchsorted (
http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.searchsorted.html)
but in a less efficient way.I cannot find a reason to maintain find_index. Do you?
—
Reply to this email directly or view it on GitHub
#18.
Did you try to time it?
In [1]:
import numpy as np
def find_index(x, xs):
"""Find the index corresponding to a given value in an array."""
n = len(xs)
start = xs[0]
end = xs[-1]
i = round((n-1) * (x - start) / (end - start))
return int(i)
test_array = np.array(range(99999))/1.0
print(test_array)
[ 0.00000000e+00 1.00000000e+00 2.00000000e+00 ..., 9.99960000e+04
9.99970000e+04 9.99980000e+04]
In [2]:
%timeit find_index(50000, test_array)
The slowest run took 8.06 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.16 µs per loop
In [3]:
%timeit test_array.searchsorted(50000)
The slowest run took 25.84 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 502 ns per loop
In [4]:
%timeit find_index(0, test_array)
The slowest run took 9.03 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.14 µs per loop
In [5]:
%timeit test_array.searchsorted(0)
The slowest run took 19.14 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 505 ns per loop
In [6]:
%timeit find_index(1000000, test_array)
The slowest run took 9.18 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.18 µs per loop
In [7]:
%timeit test_array.searchsorted(1000000)
The slowest run took 16.96 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 516 ns per loop
There could be some optimization in numpy.searchsorted and less overhead.. just a guess.