Can we switch `re` to `regex`?
kthyng opened this issue · 1 comments
I have a limited understanding of the difference between the two regular expression packages, but re
won't allow patterns anymore in which "global flags" like (?i)
are present not at the beginning of a regular expression pattern, whereas regex
will. I have been setting up my custom vocabularies such that a flag like that might end up later in a pattern because they can be linked together with |
.
For example,
import cf_xarray as cfx
import xarray as xr
vocab = {"sea_ice_u": {"name": "(?i)^(?!.*(qc|status))(?=.*sea)(?=.*ice)(?=.*u)|(?i)^(?!.*(qc|status))(?=.*sea)(?=.*ice)(?=.*x)(?=.*vel)"}}
ds = xr.Dataset()
ds["sea_ice_velocity_x"] = [0,1,2]
with cfx.set_options(custom_criteria=vocab):
ds.cf["sea_ice_u"]
Currently returns
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/Users/kthyng/miniconda3/envs/omsa3/lib/python3.11/site-packages/cf_xarray/accessor.py", line 2034, in __getitem__
return _getitem(self, key)
^^^^^^^^^^^^^^^^^^^
File "/Users/kthyng/miniconda3/envs/omsa3/lib/python3.11/site-packages/cf_xarray/accessor.py", line 685, in _getitem
names = _get_all(obj, k)
^^^^^^^^^^^^^^^^
File "/Users/kthyng/miniconda3/envs/omsa3/lib/python3.11/site-packages/cf_xarray/accessor.py", line 385, in _get_all
results = apply_mapper(all_mappers, obj, key, error=False, default=None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kthyng/miniconda3/envs/omsa3/lib/python3.11/site-packages/cf_xarray/accessor.py", line 117, in apply_mapper
results.append(_apply_single_mapper(mapper))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kthyng/miniconda3/envs/omsa3/lib/python3.11/site-packages/cf_xarray/accessor.py", line 101, in _apply_single_mapper
results = mapper(obj, key)
^^^^^^^^^^^^^^^^
File "/Users/kthyng/miniconda3/envs/omsa3/lib/python3.11/site-packages/cf_xarray/accessor.py", line 214, in _get_custom_criteria
if re.match(patterns, obj[var].attrs.get(criterion, "")):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kthyng/miniconda3/envs/omsa3/lib/python3.11/re/__init__.py", line 166, in match
return _compile(pattern, flags).match(string)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kthyng/miniconda3/envs/omsa3/lib/python3.11/re/__init__.py", line 294, in _compile
p = _compiler.compile(pattern, flags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kthyng/miniconda3/envs/omsa3/lib/python3.11/re/_compiler.py", line 743, in compile
p = _parser.parse(p, flags)
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kthyng/miniconda3/envs/omsa3/lib/python3.11/re/_parser.py", line 980, in parse
p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kthyng/miniconda3/envs/omsa3/lib/python3.11/re/_parser.py", line 455, in _parse_sub
itemsappend(_parse(source, state, verbose, nested + 1,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kthyng/miniconda3/envs/omsa3/lib/python3.11/re/_parser.py", line 841, in _parse
raise source.error('global flags not at the start '
re.error: global flags not at the start of the expression at position 48
But if I replace re
with regex
(and do some renaming since the variable holding regular expressions in accessor.py is also called "regex") I get back:
<xarray.DataArray 'sea_ice_velocity_x' (sea_ice_velocity_x: 3)>
array([0, 1, 2])
Coordinates:
* sea_ice_velocity_x (sea_ice_velocity_x) int64 0 1 2
I suppose there is a reason that re
doesn't allow this anymore but I would prefer to be able to do so! What do others think? @dcherian you might be the other person who has used custom vocabularies?
I don't know the differences, but since regex
is backwards compatible, we could optionally use it if available.
So
try:
from regex import match
except ImportError:
from re import match
We can add regex
to the optional-deps environment for testing.
PR welcome!