[Bug]: Unsafe FURB108 when lazy evaluation is required
Opened this issue · 1 comments
Has your issue already been fixed?
- Have you checked to see if your issue still exists on the
master
branch? See the docs for instructions on how to setup a local build of Refurb. - Have you looked at the open/closed issues to see if anyone has already reported your issue?
- If reporting a false positive/incorrect suggestion, have you double checked that the suggested fix changes the code semantics?
The Bug
The following code:
from bisect import bisect
from collections.abc import Sequence
def f(events: Sequence[int], threshold: int) -> None:
cutoff = bisect(events, threshold)
if cutoff == 0 or events[cutoff - 1] == 0:
pass
Emits the following error:
$ refurb file.py
file.py:6:8 [FURB108]: Replace `x == y or z == y` with `y in (x, z)`
However, this suggestion is unsafe because if events
is empty, cutoff - 1
is not a valid index. With or
this is fine, as lazy evaluation halts after it finds that cutoff == 0
is true, but the tuple notation always evaluates both expressions.
I don't expect Refurb to understand the details of bisect()
, but it could perhaps detect that cutoff
is checked in the left hand side and used in the right hand side of or
and as a precaution not issue FURB108.
I'm also not sure the suggested change (de-duplicating the 0
) would actually make the code more elegant, but that is subjective, while the unsafe behavior is an objective problem.
Version Info
Refurb: v2.0.0
Mypy: v1.13.0
Python Version
Python 3.12.7
Config File
# N/A
Extra Info
The double check:
from bisect import bisect
from collections.abc import Sequence
def f1(events: Sequence[int], threshold: int) -> None:
cutoff = bisect(events, threshold)
if cutoff == 0 or events[cutoff - 1] == 0:
pass
def f2(events: Sequence[int], threshold: int) -> None:
cutoff = bisect(events, threshold)
if 0 in (cutoff, events[cutoff - 1]):
pass
f1([], 123)
f2([], 123)
Here, the f1()
call returns, while the f2()
call raises IndexError
.
Thank you @mthuurne for opening this! There is a note in the FURB108 explainer that touches on this:
$ refurb --explain FURB108
...
Note: This should not be used if the operands depend on boolean short
circuiting, since the operands will be eagerly evaluated. This is primarily
useful for comparing against a range of constant values.
Ideally Refurb would be able to detect these situations, but for now it can't. While we can make a special case for this, it is hard to detect the general case: For example, is f() == 0 or g() == 0
safe, or unsafe? We have to decide whether it is better to assume everything is safe or unsafe. With the or
expression, we have to decide: Are they trying to simply compare values, or are they taking advantage of lazy evaluation? In the case of FURB108, if you compare two things using == X
, we assume it's most likely a comparison.
So, some potential solutions:
- Make FURB108 disabled by default. I think this is too harsh, because it is a good check, but can't detect certain situations
- Only allow "basic" comparisons. This would "solve" the problem, but would make it harder to find complex expressions where it would be desirable to re-write it as an tuple comparison
- Have a mix of both: Provide a way to enable FURB108 by default, but disable FURB108 in potentially unsafe situations. Ruff has a similar notion of "safe"/"stable" vs "unstable" checks, so something similar would be good here.
- Improve Refurb to better detect "safe" vs "unsafe" code. This would be a much better long-term solution, but involves lots of time and dedication to get right.
With that in mind, the best path forward IMO is:
- Disable FURB108 by default to prevent bad/buggy suggestions
- Add the ability to detect "simple" vs "complex" expressions in FURB108
- Re-enable FURB108, and add an option to FURB108 (depends on #100) that allows specifying whether or not complex expressions should be detected/reported
- Implement an "unsafe code detector" system, and once the detection gets good enough, enable this again by default.