netromdk/vermin

Incorrect detection of self-documenting f-strings

GjjvdBurg opened this issue ยท 11 comments

Describe the bug

False-positives are returned for self-documenting f-strings.

To Reproduce

$ cat /tmp/test.py
a = 1
print(f"Hello world={a}")
$ vermin /tmp/test.py
Minimum required versions: 3.8
Incompatible versions:     2

Expected behavior

Expected the minimum required version for this code to be 3.6. A self-documenting f-string has the equals sign on the other side of the variable (i.e. {a=}).

Environment (please complete the following information):

  • Vermin version 0.9.2

Thanks in advance!

Thanks for noticing, @GjjvdBurg! Looking at it now. :)
I can deduce that you were running Python 3.8 to execute Vermin to achieve this result, right?

There were some cases I hadn't noticed the first time around. Thanks again!

Actually there are more cases (tested on Python 3.8 on your patch just now):

Case 1
f'{ a = }'
Expected: detected as 3.8+
Actual: detected as 3.6+
Notes: ASCII whitespaces ([ \t\n\r\f\v]) are allowed before and after the =, and are preserved in the output.

Case 2
f'{a+b=}', f'{a+1=}'
Expected: detected as 3.8+
Actual: detected as 3.6+
Notes: However f'{1+a=}' and f'{1+1=}' are detected correctly.

Case 3
f'{(x:=1)}'
Expected: detected as 3.8+
Actual: detected as 3.6+
Notes: This is not a "self-documenting f-string", this is an f-string containing an assignment expression (which is new in 3.8).

Unfortunately, "self-documenting f-string" or "f-string debugging" is poorly documented in the official documentation (I can only find it in whatsnew and changelog). The PR introducing this change contains some useful information (especially the test cases), along with its corresponding issue (contains relevant discussion when introducing this feature, might be a bit noisy because some previous proposals mentioned were rejected).

Thanks, @gousaiyang :) Looking into it.

I already implemented support for named expressions actually, but the sub node wasn't getting visited. So now f'{(x:=1)}' will correctly yield:

!2, 3.8      /tmp/fstr_named_expr.py
  f-strings require 3.6+
  named expressions require 3.8+

Nice test cases. Have to include those, too.

Okay, that should be all the remaining cases. Phew..

I just realized that the false positives happen because the ast module returns an optimized AST (= related information is lost). Which means that f'foo{name=}' is parsed into ast.Constant with value = fooname= (constant string merged) and ast.FormattedValue containing ast.Name name.

Unfortunately, this means that f'{a=}' and f'a={a!r}' will be parsed into the same AST, except for the end_col_offset attribute:

In [26]: ast.parse("f'{a=}'").body[0].value.values[0].__dict__
Out[26]:
{'value': 'a=',
 'kind': None,
 'lineno': 1,
 'col_offset': 0,
 'end_lineno': 1,
 'end_col_offset': 7}

In [27]: ast.parse("f'{a=}'").body[0].value.values[1].__dict__
Out[27]:
{'value': <_ast.Name at 0x29b0cc78c70>,
 'conversion': 114,
 'format_spec': None,
 'lineno': 1,
 'col_offset': 0,
 'end_lineno': 1,
 'end_col_offset': 7}

In [28]: ast.parse("f'a={a!r}'").body[0].value.values[0].__dict__
Out[28]:
{'value': 'a=',
 'kind': None,
 'lineno': 1,
 'col_offset': 0,
 'end_lineno': 1,
 'end_col_offset': 10}

In [29]: ast.parse("f'a={a!r}'").body[0].value.values[1].__dict__
Out[29]:
{'value': <_ast.Name at 0x29b0cc737f0>,
 'conversion': 114,
 'format_spec': None,
 'lineno': 1,
 'col_offset': 0,
 'end_lineno': 1,
 'end_col_offset': 10}

So it seems impossible to differentiate f'{a=}' and f'a={a!r}' with the standard ast module without using the end_col_offset attribute or doing extra parsing of the original source code. Using the end_col_offset attribute might help (deduce from the length of code), but it will be somewhat tricky because the f-string could be very complex like f'foo{f"nested"=}bar{no}baz{another=}'.

Yes, you are exactly right. I've run into this a number of times for different features, and as also mentioned in #35, I've thought about maybe utilizing an additional parser. But it would still be required to function with Python 2.7/3.0, and I still prefer the standard ast parser for maintainability and always being up-to-date.

The source visitor class is given the parsed AST (because errors need to be handled beforehand due to the concurrent behavior), but it could perhaps be given the entire source in string form as well if any extra parsing is necessary - like parsing the [(lineno, col_offset); (end_lineno, end_col_offset)] part for this case. Hm.

There are still trivial cases tripping this so please re-open. E.g:

$ cat tmp.py
a = 0
b = 0
print(f'a={a} b={b}')

$ vermin -vv --no-tips tmp.py
Detecting python files..
Analyzing using 4 processes..
!2, 3.8      /home/mark/tmp.py
  f-strings require 3.6+
  print(expr) requires 2+ or 3+
  self-documenting fstrings require 3.8+

Minimum required versions: 3.8
Incompatible versions:     2

Sorry for the slow reply.

The problem, as already described above, is that ast optimizes away valuable information. The AST cannot distinguish f'{a=}' from f'a={a}'. This is why your "trivial" case is seen as a self-doc f-string.

With this in mind, I think it would make more sense to disable the entire self-doc f-string detection (but keeping regular f-string detection, of course).