[BUG]: Sanitizing regex does not exclude string literals
taldcroft opened this issue · 3 comments
taldcroft commented
4b2d89c introduces a regression when an expression includes a string literal with any of the new forbidden characters. This is breaking our production code when we upgrade numexpr to 2.8.7.
Example:
>>> import numexpr as ne
>>> ne.__version__
'2.8.7'
>>> import numpy as np
>>> x = np.array(['a', 'b'], dtype=bytes)
>>> ne.evaluate("x == 'b'")
array([False, True])
>>> ne.evaluate("x == 'b:'")
Traceback (most recent call last):
Cell In[6], line 1
ne.evaluate("x == 'b:'")
File ~/miniconda3/envs/numexpr/lib/python3.10/site-packages/numexpr/necompiler.py:975 in evaluate
raise e
File ~/miniconda3/envs/numexpr/lib/python3.10/site-packages/numexpr/necompiler.py:872 in validate
_names_cache[expr_key] = getExprNames(ex, context, sanitize=sanitize)
File ~/miniconda3/envs/numexpr/lib/python3.10/site-packages/numexpr/necompiler.py:721 in getExprNames
ex = stringToExpression(text, {}, context, sanitize)
File ~/miniconda3/envs/numexpr/lib/python3.10/site-packages/numexpr/necompiler.py:281 in stringToExpression
raise ValueError(f'Expression {s} has forbidden control characters.')
ValueError: Expression x == 'b:' has forbidden control characters.
27rabbitlt commented
This could be fixed by firstly replacing content within quotes before trying to match blacked list. I will fix this and add some tests.
taldcroft commented
Thanks, looking forward to the next release! Looks like this can be closed now?
27rabbitlt commented
Yes ^-^
…On Wed, Jan 24, 2024 at 11:09 Tom Aldcroft ***@***.***> wrote:
Thanks, looking forward to the next release! Looks like this can be closed
now?
—
Reply to this email directly, view it on GitHub
<#468 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A33BDH3LGQ6RIORE2VXP6ALYQDMWFAVCNFSM6AAAAABBI5QLPSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBXHAYTCNJWGU>
.
You are receiving this because you commented.Message ID:
***@***.***>