nschloe/perfplot

[BUG] Wrong results for `iter`

Closed this issue · 12 comments

Describe the bug

I ran some benchmarks on the iter function, and the results don't align with what is expected. Like lists, searching for an item in an iterator should be in linear time O(n). However, perfplot shows that it is O(1). At some point, when the list gets larger, it shows logical behavior O(n).

Figure_10

To Reproduce

import random
import perfplot

def setup(n):
    data = list(range(n))
    random.shuffle(data)
    dict_array = {x:i for i, x in enumerate(data)}
    return data, set(data), iter(data), dict_array

def list_in(*args):
    return -99 in args[0]

def set_in(*args):
    return -99 in args[1]

def iter_in(*args):
    return -99 in args[2]

def dict_in(*args):
    return -99 in args[3]

def convert_list_to_set(*args):
    return set(args[0])

def convert_list_to_iter(*args):
    return iter(args[0])

def convert_list_to_dict(*args):
    return {x:i for i, x in enumerate(args[0])}


b = perfplot.bench(
    setup=setup,
    kernels=[
        list_in,
        set_in,
        iter_in,
        dict_in,
        convert_list_to_set,
        convert_list_to_iter,
        convert_list_to_dict,
    ],
    n_range=[2 ** k for k in range(1, 25)],
    xlabel="length of the list.",
    equality_check= None
)

b.show()

Diagnose

perfplot==0.10.2

At first, I thought the problem could be with the equality_check= None parameter. So, I removed it and tried to benchmark the iter function alone. It results in the same behaviors.

Figure_12

list and iter should give the same O(N) as runtime. Perhaps list is slightly faster.

What makes you think this is wrong? Can you confirm with some timeit times?

In theory, the only way to find an element in an iterator is by consuming elements and checking. In other words, invoking __next__ on the iterator and then an if statement which means a linear runtime O(n). That's my understanding of how iterators work unless there is some wizard logic that can tell if an element exists in an iterator in O(1), like voila! The element exists in an iterator. But I don't think it is O(1), like a hashmap and such.

However, running timeit shows the following results for the first 20 iterations:

#  Obj  Size      Time             Obj  Size      Time
0. list 10        3.155e-07    |   iter 10        7.538e-08    |   
1. list 20        6.094e-07    |   iter 20        6.584e-08    |   
2. list 40        1.117e-06    |   iter 40        7.547e-08    |   
3. list 80        2.448e-06    |   iter 80        8.883e-08    |   
4. list 160       5.606e-06    |   iter 160       8.822e-08    |   
5. list 320       1.121e-05    |   iter 320       8.999e-08    |   
6. list 640       2.836e-05    |   iter 640       1.032e-07    |   
7. list 1280      5.062e-05    |   iter 1280      9.029e-08    |   
8. list 2560      8.867e-05    |   iter 2560      9.425e-08    |   
9. list 5120      0.0001777    |   iter 5120      1.003e-07    |   
10 list 10240     0.000392     |   iter 10240     3.073e-07    |   
11 list 20480     0.0007514    |   iter 20480     3.129e-07    |   
12 list 40960     0.001523     |   iter 40960     9.238e-07    |   
13 list 81920     0.004166     |   iter 81920     1.902e-06    |   
14 list 163840    0.007661     |   iter 163840    0.0001409    |   
15 list 327680    0.01217      |   iter 327680    6.748e-05    |   
16 list 655360    0.02328      |   iter 655360    0.0001344    |   
17 list 1310720   0.04645      |   iter 1310720   0.002688     |   
18 list 2621440   0.09363      |   iter 2621440   0.005507     |   
19 list 5242880   0.1998       |   iter 5242880   0.0113       |   

Both list and iter have pretty much identical runtime, O(n), as timeit shows, unlike perfplot. This confirms my hypothesis that there is a glitch in the simulation. I mean a bug in perfplot.

Code:

from timeit import Timer

print("#  Obj  Size      Time             Obj  Size      Time")
for iteration in range(25):
    size = 10 * 2 ** iteration
    if size < 1000:
        number = 200_000
    elif size < 10_000:
        number = 20_000
    elif size < 100_000:
        number = 2000
    elif size < 1_000_000:
        number = 200
    else:
        number = 20
    print(f"{iteration:.<2}", end=' ')
    for name, snippet in [("list", "-99 in list_"), ("iter", "-99 in iter_")]:
        T = Timer(snippet, (lambda n: f"list_ = list(range({n})); iter_ = iter(list_)")(size))
        best = min(T.repeat(number=number, repeat=7))
        print(f"{name} {size:<8}  {(best/number):1.4g}", end='    |   ')
    print()

Thanks for the tool, BTW. It allows me to do some crazy scientific stuff and share my results on Twitter, like this one:

Efficient Python: Day-05

Did you know that using a simple plus operator is the fastest approach to concatenate two strings of equal length?

concat_strings = string1 + string2#python #100DaysOfCode #Python3 >

— Mahmoud (@Harmouch101) June 3, 2022

I got this:

In [24]: it = iter(range(10))

In [25]: %timeit (-1 in it)
35.6 ns ± 0.0944 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [26]: it = iter(range(100))

In [27]: %timeit (-1 in it)
35.2 ns ± 0.0736 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [28]: it = iter(range(1000))

In [29]: %timeit (-1 in it)
35.2 ns ± 0.0409 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [30]: it = iter(range(10000))

In [31]: %timeit (-1 in it)
35.5 ns ± 0.223 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [32]: it = iter(range(100000))

In [33]: %timeit (-1 in it)
35.4 ns ± 0.5 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [34]: it = iter(range(1000000))

In [35]: %timeit (-1 in it)
35.3 ns ± 0.118 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [36]: it = iter(range(10000000))

In [37]: %timeit (-1 in it)
235 ns ± 305 ns per loop (mean ± std. dev. of 7 runs, 1 loop each)

I think you measure the iter() construction which in my example is in the setup.

You are absolutely right. I have updated the code above. Now, it shows a constant lookup time for lists that contains ~ 10**6 elements which begs the question: how in the world is it even possible with iter? How is it implemented at a low level? This proves that the fastest approach to find an item within a list is by converting it into an iterator. Interesting! If you have any explanation whatsoever for this sort of behavior, lemme know. For me, It seems like some kind of Egyptian magic.

I am going to close this issue. I am looking forward to your explanation of this strange behavior.

I'm looking forward to yours!

Hahahahaha. I have no idea at the moment. Presumably, I will investigate this case in one of my python internals series. In any case, it is good to know that iter is blazingly fast.

I'd post on a Python mailing list.

I have figured out the reason. timeit was measuring a StopIteration after the first iteration, which is actually negligible compared to the first iteration.

We can write a handy script to help us understand what timeit was trying to accomplish:

import time
list_ = list(range(10**3))
iter_ = iter(list(range(10**3)))
list_time_samples = []
iter_time_samples = []
for _ in range(100):
  start = time.perf_counter_ns()
  _ = -99 in list_
  end = time.perf_counter_ns()
  list_time_samples.append(end - start)
for _ in range(100):
  start = time.perf_counter_ns()
  _ = -99 in iter_
  end = time.perf_counter_ns()
  iter_time_samples.append(end - start)

print(sum(list_time_samples)/100, sum(iter_time_samples)/100, sep="\n")
print(list_time_samples)
print(iter_time_samples)

Output:

34764.78
1168.4

[35515, 34760, 34533, 34492, 34469, 34580, 34513, 34432, 34577, 34493, 34635, 34612, 34484, 34422, 34440, 34499, 34469, 34536, 34446, 34487, 34469, 34470, 34482, 34476, 34680, 34476, 34475, 34430, 34489, 34436, 34480, 34482, 34480, 34483, 34489, 34429, 34440, 34436, 34475, 34583, 34426, 34476, 34473, 34480, 34423, 34526, 34593, 34642, 34492, 34497, 34476, 34567, 39331, 34473, 34690, 34486, 34467, 34475, 34484, 34513, 34577, 34463, 34530, 34479, 34422, 34480, 34472, 34435, 34473, 34432, 34476, 55074, 34596, 34489, 34500, 34623, 34482, 34459, 34483, 34496, 34697, 34459, 34436, 34443, 34489, 34483, 34495, 34600, 34670, 34477, 34479, 34480, 34482, 34470, 34489, 34484, 34439, 34485, 34487, 34449]

[55691, 831, 611, 642, 644, 605, 618, 612, 609, 602, 604, 645, 608, 604, 608, 646, 618, 614, 605, 597, 621, 612, 607, 614, 612, 601, 608, 615, 608, 609, 611, 615, 602, 619, 601, 607, 606, 668, 652, 618, 588, 608, 606, 619, 608, 609, 614, 642, 638, 612, 601, 611, 612, 602, 608, 612, 615, 601, 602, 611, 648, 604, 678, 612, 604, 604, 609, 604, 621, 605, 618, 632, 608, 602, 611, 652, 621, 571, 605, 607, 652, 602, 614, 612, 612, 601, 601, 656, 614, 615, 627, 609, 605, 691, 604, 629, 614, 607, 612, 608]

So, we can only compare the results for the first iteration, cause after that one, the iterator will throw a StopIteration exception each time the program tries to execute the __next__ method which results in lookup time being ~zero.

Key takeaway:

  • A list is slightly faster than an iterator.
  • We can't compute the average lookup time for an iterator. The only way to do so is by creating a new iterator each time we try to measure the time for element lookup, which will include the list to iter conversion.

I see! Thanks for digging this up.