Zac-HD/shed

Proposal: shed should use ast.parse to validate code on errors

DRMacIver opened this issue · 4 comments

Very often shed will claim to have a bug when given invalid Python that libcst fails to reject. This is at best confusing to the end user.

libcst would ideally never do this, but that is not the world we live in, so I propose the following fix: When shed would show the "please report this as a bug" error, it should instead first parse the original source code with ast.parse and check if it's actually valid Python, and show a different error message if it's not.

We already check that our input is syntactically valid:

msg = f"Could not parse {_location}\n {type(err).__qualname__}: {err}"
for pattern, blocktype in _SUGGESTIONS:
if re.search(pattern, source_code, flags=re.MULTILINE):
msg += f"\n Perhaps you should use a {blocktype!r} block instead?"
try:
compile(source_code, "<string>", "exec")
except SyntaxError:
pass
else:
msg += (
f"\n The syntax is valid for Python {sys.version_info.major}"
f".{sys.version_info.minor}, so please report this as a bug."
)

...but per #92, the problem is actually libcst failing to parse valid code; we can and should check for that and give a more specific error message in such cases.

(I've also occasionally thought about adding minimization, via e.g. https://pypi.org/project/pysource-minimize/ or at worst https://github.com/Zac-HD/mwe)

If so something is going wrong with this check, because here's another example I ran into:

from collections import defaultdict

VALUES = defaultdict(
    int, (i, i) for i in range(10)
)

This isn't valid Python:

  File "/Users/drmaciver/scratch/shedbug.py", line 4
    int, (i, i) for i in range(10)
         ^^^^^^^^^^^^^^^^^^^^^^^^^
SyntaxError: Generator expression must be parenthesized

But the following is the error I get when I call shed --refactor on it:

Internal error formatting 'shedbug.py': ParserSyntaxError: Syntax Error @ 3:1.
parser error: error at 3:37: expected one of !=, %, &, (, *, **, +, ,, -, ., /, //, <, <<, <=, ==, >, >=, >>, @, [, ^, and, if, in, is, not, or, |

VALUES = defaultdict(int, (i, i) for i in range(10))
^
    Please report this to https://github.com/Zac-HD/shed/issues

Right, I did a bit of investigation. The problem scenario is when we have input code which satisfies the following criteria:

  1. It is not syntactically correct Python
  2. lib2to3 happily parses it
  3. So does libcst with its default parser (actually I'm not sure this one matters! But it is happening, and was misleading me about the underlying problem)
  4. libcst with the native parser correctly rejects it.

The reason why this is firing with shed --refactor turns out to have nothing to do with any changes shed is making to the code, but is because when running codemods shed forces the use of the native parser: https://github.com/Zac-HD/shed/blob/master/src/shed/_codemods.py#L62

So all that happens is that the code gets to the point of trying to parse the code with libcst's native parser, correctly gets a syntax error, and reports a confusing error message.

Fixed in 23.5.1 / baf7705 by... skipping the libcst logic when it (correctly) rejects invalid code. Potentially a bit confusing if you expected refactorings to happen, but all the other options I could think of seemed worse 😅