Make `SyntaxWarning` for invalid escape sequences better reflect their intended deprecation
umarbutler opened this issue · 9 comments
Feature or enhancement
Proposal:
I would like to propose that the SyntaxWarning
that is raised when an invalid escape sequence is used be updated to better reflect the fact that the ability of Python users to employ invalid escape sequences is intended to be removed in a future Python release.
At present, if one runs the code path = 'C:\Windows'
in Python, they get this warning: SyntaxWarning: invalid escape sequence '\W'
.
I argue that that warning is not as semantically meaningful as it could be. It does not immediately convey to the untrained and/or uninitiated that path = 'C:\Windows'
will in fact break in a future Python release.
What is a better of way communicating that? How about, SyntaxWarning: '\W' is currently an invalid escape sequence. In the future, invalid escape sequences will raise a SyntaxError. Did you mean '\\W'?
.
That message is a little bit longer but it immediately tells the user, without the need for heading to Stack Overflow or Python documentation, that:
- Although the code runs today, it won't run soon!
- You can fix the code easily, just add an extra backslash.
Whereas all that SyntaxWarning: invalid escape sequence '\W'
tells me is, at best, hey, there's something wrong here, but you've gotta figure out what that is. Someone could easily read that message and think maybe Python is trying to be helpful and make me double check that I didn't actually mean to type a valid escape sequence like \f
or \n
.
A message like SyntaxWarning: '\W' is currently an invalid escape sequence. In the future, invalid escape sequences will raise a SyntaxError. Did you mean '\\W'?
makes it much more difficult to come away with the message that the Python developers might not like what I've just done but it works and it'll keep working forever.
Update: The proposed message has been revised to "\W" is an invalid escape sequence. Such sequences will not work in the future. Did you mean "\\W"?
following consultation.
Has this already been discussed elsewhere?
I have already discussed this feature proposal on Discourse
Links to previous discussion of this feature:
Linked PRs
@umarbutler, are you comfortable creating a PR for this?
@ethanfurman Yes, I'm going to have a crack at it. I'm not a C programmer but the change doesn't seem too difficult.
Just an update, I have revised the proposed wording to "\W" is an invalid escape sequence. Such sequences will not work in the future. Did you mean "\\W"?
following community consultation. This wording is more concise, direct and remains accurate.
We can remove "Such sequences":
"\W" is an invalid escape sequence and will not work in the future. Did you mean "\\W"?
Let's also mention raw strings, as in many cases raw strings are a cleaner and easier solution.
In the thread you say:
In practice, a raw string is typically best, but categorically advising the use of raw strings could have unintended consequences, whereas a double backlash is much less likely to have that effect, though its a little more verbose.
This is a suggestion, it doesn't have to cover all the edge cases. We can add more detail to the docs.
We can remove "Such sequences":
@hugovk The reason I included 'Such sequences' is that this warning could raise for \e
and then read as "\e" is an invalid escape sequence and will not work in the future
which may not be true as I believe the intention is in fact to add \e
as a new escape. Thus, it is not that \e
won't work in the future, it is that, specifically, invalid escape sequences (which \e
currently belongs to but may not always) will not. As I mentioned in the thread, "there is a possibility where we jump from \e
raising a warning to \e
working perfectly."
It's a bit pedantic but the current construction avoids ingraining into developers that \e
or \W
or whatever else is always invalid slightly less.
With all that said, if you still feel that its better to run with is an invalid escape sequence and will not work in the future
, I'm happy to go with that :)
Let's also mention raw strings, as in many cases raw strings are a cleaner and easier solution.
Hmm... We could do "\W" is an invalid escape sequence. Such sequences will not work in the future. Did you mean "\\W"? A raw string is also an option.
?
That way the user gets an immediate suggested fix guaranteed to work (in line with the broader reasoning of why Did you mean "=="?
was added despite the fact that if you run 1 is int
you probably want to be using isinstance()
) and we also let them know raw strings are an option. Anyone familar with raw strings will then immediately know which they really want to be using and anyone not familar with raw strings will realise they're worth learning more about.
This tends to look more and more verbose. More intended to regular docs, not to the error message, that should (IMO) just explain what's goes wrong.
without the need for heading to Stack Overflow or Python documentation
Why the Python docs are wrong place to such things? I would expect, that current deprecations are listed in the whatsnew.
Bad news, to read about the given deprecation you should navigate to past release notes. I would expect to see this somewhere in latest Pending Removal* sections. But no. Perhaps it's a documentation issue.
@skirpichev Please see the discussions here and here.
I understand that it might seem obvious to you but to me, on its own, the message SyntaxWarning: invalid escape sequence '\W'
does not really communicate that the code will be broken in the future and that you can fix the breakage by simply adding an extra backslash.
A couple extra characters added to a warning message that should ideally not affect anyone familar with the problem anyways (as they would hopefully understand that they now need to use raw strings and/or escaped backslashes) will provide new and/or unexperienced users far more value than the harm of making it a little more verbose, in my view at least.
Not everyone reads the Whats New. And new users shouldn't really be expected to have to do that to understand this warning. It really should speak for itself.
How about this wording:
"\W" is not a valid escape sequence and it may change meaning or cause an error
in the future. Use "\\W" to mean a backslash followed by W, or consider using a
raw string literal instead.
Changes:
- "invalid" -> "not valid" is IMO a bit softer
- less specific about future deprecation, so that it can be generally true for all escape sequences
- Imperative advice rather than a "did you mean" (I'm not a fan of the latter in general; but right now it cues having misspelled an identifier rather than a syntax issue, so it shouldn't become overloaded)
- tries to confirm what the user actually wants the string to mean rather than offering the suggestion purely from a guess
- Mentions the possibility of raw strings while not pretending that it's a universal, trivial fix
Yes, it's longer, but I think this kind of error text is an important "Explicit is better than implicit" case. (And for experienced users who did just make a typo, the important part is still front-loaded.)
@zahlman I think the current wording already adequately captures the two key pieces of information it needs to capture:
- The current syntax will break in the future.
- You can fix it by double escaping your backslash or using a raw string.
Those are the two key pieces of information that need to be conveyed.
Your proposed wording is a bit too long and may overload the user with information and take longer for them to get the core message.
'invalid escape sequences' is the term currently used by existing Python documentation so it's probably best to avoid changing it. It is also used by other programming languages and so has its own meaning.
I think a lot of the information you've proposed adding is information that users can easily find online on Stack Overflow or Python documentation. The purpose of the revised wording is really to give them a quick fix and a little heads up that their code will break.
Also the wording of 'Did you mean?' is already used in the SyntaxWarning
for 1 is int
and is used in other warnings.
IMO the current wording is sufficently concise and clear.