Performance issue R1730 consider-using-min-builtin, R1731 consider-using-max-builtin
PanderMusubi opened this issue · 2 comments
Current problem
Following advice of R1730 consider-using-min-builtin, R1731 consider-using-max-builtin inside a for or a while loop will degrade performance of factor more than 3.
This is illustrated by the following example:
from random import randint
from sys import maxsize
from time import time
print('preparing\n')
values = []
for _ in range(2000000):
values.append(randint(1243, 2001245))
min_a = maxsize
min_b = maxsize
max_a = 0
max_b = 0
max_c = 0
tim_amin = 0.0
tim_amax = 0.0
tim_bmin = 0.0
tim_bmax = 0.0
print('type\ttotal time needed\tvalue')
start = time()
for i in values:
if i < min_a:
min_a = i
tim_amin = time()-start
print(f'min_a\t{tim_amin}\t{min_a}')
start = time()
for i in values:
if i > max_a:
max_a = i
tim_amax = time()-start
print(f'max_a\t{tim_amax}\t{max_a}')
start = time()
for i in values:
min_b = min(i, min_b)
tim_bmin = time()-start
print(f'min_b\t{tim_bmin}\t{min_b}')
start = time()
for i in values:
max_b = max(max_b, i)
tim_bmax = time()-start
print(f'max_b\t{tim_bmax}\t{max_b}')
print(f'\nspeedups {tim_bmin/tim_amin} and {tim_bmax/tim_amax}')
the output is
preparing
type total time needed value
min_a 0.0997917652130127 1243
max_a 0.09552979469299316 2001245
min_b 0.30522990226745605 1243
max_b 0.3029053211212158 2001245
speedups 3.0586682339561877 and 3.1707942228356223
Desired solution
The following suggestions to implement:
a) do not give this suggestion for assigning min() or max() to one of its arguments in a loop
or
b) this suggestion is given but with a performance warning
and additionally
- performance warning is givven when assigning min() or max() to one of its arguments in a loop
Additional context
No response
Thank you for the detailed issue and benchmark.
Here is an improved test
Case a is loop with if <
or if >
, case b is loop with min()
or max()
. However, extra case c is running min()
or max()
directly on the entire list and is the fastest way(4~5 times faster dan case a).
So the advice is correct, but only if the developer also places the min()
or max()
outside of the loop on the entire list.
Below is an improvement to the benchmark.
from random import randint
from sys import maxsize
from time import time
def bench(values):
"""Bla bla."""
min_a = maxsize
min_b = maxsize
max_a = 0
max_b = 0
start = time()
min_c = min(values)
tim_cmin = time()-start
print(f'c) min(values)\t{tim_cmin}\t{min_c}')
start = time()
max_c = max(values)
tim_cmax = time()-start
print(f'c) max(values)\t{tim_cmax}\t{max_c}')
start = time()
for i in values:
if i < min_a:
min_a = i
tim_amin = time()-start
print(f'a) loop if <\t{tim_amin}\t{min_a}')
start = time()
for i in values:
if i > max_a:
max_a = i
tim_amax = time()-start
print(f'a) loop if >\t{tim_amax}\t{max_a}')
start = time()
for i in values:
min_b = min(i, min_b)
tim_bmin = time()-start
print(f'b) loop min(i)\t{tim_bmin}\t{min_b}')
start = time()
for i in values:
max_b = max(max_b, i)
tim_bmax = time()-start
print(f'b) loop max(i)\t{tim_bmax}\t{max_b}')
print('LIST')
values = []
for _ in range(2000000):
values.append(randint(1243, 2001245))
bench(values)
print('SET')
values = set()
for _ in range(2000000):
values.add(randint(1243, 2001245))
bench(values)
output
LIST
c) min(values) 0.022024869918823242 1243
c) max(values) 0.02367424964904785 2001245
a) loop if < 0.050209999084472656 1243
a) loop if > 0.05236554145812988 2001245
b) loop min(i) 0.20847249031066895 1243
b) loop max(i) 0.2126467227935791 2001245
SET
c) min(values) 0.09105372428894043 1243
c) max(values) 0.09903478622436523 2001242
a) loop if < 0.17780447006225586 1243
a) loop if > 0.18820405006408691 2001242
b) loop min(i) 0.3824942111968994 1243
b) loop max(i) 0.3974308967590332 2001242