pylint-dev/pylint

Performance issue R1730 consider-using-min-builtin, R1731 consider-using-max-builtin

PanderMusubi opened this issue · 2 comments

Current problem

Following advice of R1730 consider-using-min-builtin, R1731 consider-using-max-builtin inside a for or a while loop will degrade performance of factor more than 3.

This is illustrated by the following example:

from random import randint
from sys import maxsize
from time import time

print('preparing\n')
values = []
for _ in range(2000000):
    values.append(randint(1243, 2001245))

min_a = maxsize
min_b = maxsize
max_a = 0
max_b = 0
max_c = 0
tim_amin = 0.0
tim_amax = 0.0
tim_bmin = 0.0
tim_bmax = 0.0

print('type\ttotal time needed\tvalue')

start = time()
for i in values:
    if i < min_a:
        min_a = i
tim_amin = time()-start
print(f'min_a\t{tim_amin}\t{min_a}')

start = time()
for i in values:
    if i > max_a:
        max_a = i
tim_amax = time()-start
print(f'max_a\t{tim_amax}\t{max_a}')

start = time()
for i in values:
    min_b = min(i, min_b)
tim_bmin = time()-start
print(f'min_b\t{tim_bmin}\t{min_b}')

start = time()
for i in values:
    max_b = max(max_b, i)
tim_bmax = time()-start
print(f'max_b\t{tim_bmax}\t{max_b}')

print(f'\nspeedups {tim_bmin/tim_amin} and {tim_bmax/tim_amax}')

the output is

preparing

type	total time needed	value
min_a	0.0997917652130127	1243
max_a	0.09552979469299316	2001245
min_b	0.30522990226745605	1243
max_b	0.3029053211212158	2001245

speedups 3.0586682339561877 and 3.1707942228356223

Desired solution

The following suggestions to implement:

a) do not give this suggestion for assigning min() or max() to one of its arguments in a loop

or

b) this suggestion is given but with a performance warning

and additionally

  1. performance warning is givven when assigning min() or max() to one of its arguments in a loop

Additional context

No response

Thank you for the detailed issue and benchmark.

Here is an improved test

Case a is loop with if < or if >, case b is loop with min() or max(). However, extra case c is running min() or max() directly on the entire list and is the fastest way(4~5 times faster dan case a).

So the advice is correct, but only if the developer also places the min() or max() outside of the loop on the entire list.

Below is an improvement to the benchmark.

from random import randint
from sys import maxsize
from time import time


def bench(values):
    """Bla bla."""
    min_a = maxsize
    min_b = maxsize
    max_a = 0
    max_b = 0
    start = time()
    min_c = min(values)
    tim_cmin = time()-start
    print(f'c) min(values)\t{tim_cmin}\t{min_c}')
    start = time()
    max_c = max(values)
    tim_cmax = time()-start
    print(f'c) max(values)\t{tim_cmax}\t{max_c}')
    start = time()
    for i in values:
        if i < min_a:
            min_a = i
    tim_amin = time()-start
    print(f'a) loop if <\t{tim_amin}\t{min_a}')
    start = time()
    for i in values:
        if i > max_a:
            max_a = i
    tim_amax = time()-start
    print(f'a) loop if >\t{tim_amax}\t{max_a}')
    start = time()
    for i in values:
        min_b = min(i, min_b)
    tim_bmin = time()-start
    print(f'b) loop min(i)\t{tim_bmin}\t{min_b}')
    start = time()
    for i in values:
        max_b = max(max_b, i)
    tim_bmax = time()-start
    print(f'b) loop max(i)\t{tim_bmax}\t{max_b}')


print('LIST')
values = []
for _ in range(2000000):
    values.append(randint(1243, 2001245))
bench(values)

print('SET')
values = set()
for _ in range(2000000):
    values.add(randint(1243, 2001245))
bench(values)

output

LIST
c) min(values)	0.022024869918823242	1243
c) max(values)	0.02367424964904785	2001245
a) loop if <	0.050209999084472656	1243
a) loop if >	0.05236554145812988	2001245
b) loop min(i)	0.20847249031066895	1243
b) loop max(i)	0.2126467227935791	2001245
SET
c) min(values)	0.09105372428894043	1243
c) max(values)	0.09903478622436523	2001242
a) loop if <	0.17780447006225586	1243
a) loop if >	0.18820405006408691	2001242
b) loop min(i)	0.3824942111968994	1243
b) loop max(i)	0.3974308967590332	2001242