Convert a warning about flags not at the start of the regular expression into error
serhiy-storchaka opened this issue ยท 8 comments
BPO | 47066 |
---|---|
Nosy | @ezio-melotti, @serhiy-storchaka |
PRs |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
assignee = None
closed_at = <Date 2022-03-19.14:12:52.270>
created_at = <Date 2022-03-19.12:09:48.176>
labels = ['expert-regex', 'type-bug', 'library', '3.11']
title = 'Convert a warning about flags not at the start of the regular expression into error'
updated_at = <Date 2022-03-19.14:12:52.269>
user = 'https://github.com/serhiy-storchaka'
bugs.python.org fields:
activity = <Date 2022-03-19.14:12:52.269>
actor = 'serhiy.storchaka'
assignee = 'none'
closed = True
closed_date = <Date 2022-03-19.14:12:52.270>
closer = 'serhiy.storchaka'
components = ['Library (Lib)', 'Regular Expressions']
creation = <Date 2022-03-19.12:09:48.176>
creator = 'serhiy.storchaka'
dependencies = []
files = []
hgrepos = []
issue_num = 47066
keywords = ['patch']
message_count = 2.0
messages = ['415544', '415552']
nosy_count = 3.0
nosy_names = ['ezio.melotti', 'mrabarnett', 'serhiy.storchaka']
pr_nums = ['31994']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue47066'
versions = ['Python 3.11']
This warning was introduced in 3.6. The reason is that in most other regular expression implementations global inline flags in the middle of the expression have different semantic: they affect only the part of the expression after the flag. But in Python they affect the whole expression. It caused confusion and was a source of bugs.
After 5 releases it is a time to convert this warning into error. In future we can allow global inline flags in the middle of the expression with different semantic. It is safer if one or more intermediate versions will raise an error.
For the record here's some brokenness this caused:
https://bz.mercurial-scm.org/show_bug.cgi?id=6759
https://bugzilla.mozilla.org/show_bug.cgi?id=1799982
@jrmuizel Could you report this in a new issue mentioning this issue, please? I suspect almost nobody reads already closed entries.
Is it worth filing a new issue even though both of those uses have already been fixed upstream?
My bad. I've thought you linked to some extra issues similar to this but found by third-party projects that use Python.
This broke our code in production too. Say what you will about python 2.7, at least it no longer breaks prod code with low-value changes like this. The exact regular expression that we had previously used was:
r'^(?is)(ships from|dispatched from)'
Conversion of this to the "correct" form was trivial, but now I need to spend time combing our code for other examples of this behavior. Ever heard "if it ain't broke, don't fix it?" I guess that doesn't apply in modern software development...
#TIL if the intent was to only apply the flag to a subset of the regex, e.g.:
r"name-((?i)CASE-INSENSITIVE)"
You can use the following:
r"name-(?i:CASE-INSENSITIVE)"
See the (?aiLmsux-imsx:...)
section under https://docs.python.org/3/library/re.html#regular-expression-syntax
It would be great if this was mentioned on https://docs.python.org/3/whatsnew/3.11.html#porting-to-python-3-11 as I have "migrated" dozens of files running into this issue where the original intent was to only apply the flag to a subset (especially when you concatenate regexes together), but I only learned about this today. (I hope this comment can at least help some when they click the bpo-47066
link there and get redirected here.
Also, there is no mention of the word "global" on https://docs.python.org/3/library/re.html, as it seems ideal if that could mention "global" v.s. "local" inline flags.