Proposal: shed should use ast.parse to validate code on errors
DRMacIver opened this issue · 4 comments
Very often shed will claim to have a bug when given invalid Python that libcst fails to reject. This is at best confusing to the end user.
libcst would ideally never do this, but that is not the world we live in, so I propose the following fix: When shed would show the "please report this as a bug" error, it should instead first parse the original source code with ast.parse
and check if it's actually valid Python, and show a different error message if it's not.
We already check that our input is syntactically valid:
Lines 81 to 93 in 71520a9
...but per #92, the problem is actually libcst
failing to parse valid code; we can and should check for that and give a more specific error message in such cases.
(I've also occasionally thought about adding minimization, via e.g. https://pypi.org/project/pysource-minimize/ or at worst https://github.com/Zac-HD/mwe)
If so something is going wrong with this check, because here's another example I ran into:
from collections import defaultdict
VALUES = defaultdict(
int, (i, i) for i in range(10)
)
This isn't valid Python:
File "/Users/drmaciver/scratch/shedbug.py", line 4
int, (i, i) for i in range(10)
^^^^^^^^^^^^^^^^^^^^^^^^^
SyntaxError: Generator expression must be parenthesized
But the following is the error I get when I call shed --refactor
on it:
Internal error formatting 'shedbug.py': ParserSyntaxError: Syntax Error @ 3:1.
parser error: error at 3:37: expected one of !=, %, &, (, *, **, +, ,, -, ., /, //, <, <<, <=, ==, >, >=, >>, @, [, ^, and, if, in, is, not, or, |
VALUES = defaultdict(int, (i, i) for i in range(10))
^
Please report this to https://github.com/Zac-HD/shed/issues
Right, I did a bit of investigation. The problem scenario is when we have input code which satisfies the following criteria:
- It is not syntactically correct Python
- lib2to3 happily parses it
- So does libcst with its default parser (actually I'm not sure this one matters! But it is happening, and was misleading me about the underlying problem)
- libcst with the native parser correctly rejects it.
The reason why this is firing with shed --refactor
turns out to have nothing to do with any changes shed is making to the code, but is because when running codemods shed forces the use of the native parser: https://github.com/Zac-HD/shed/blob/master/src/shed/_codemods.py#L62
So all that happens is that the code gets to the point of trying to parse the code with libcst's native parser, correctly gets a syntax error, and reports a confusing error message.