intel/hyperscan

Numbered repeat doesn't work if the lower number is omitted

dchenz opened this issue · 1 comments

dchenz commented

Steps to reproduce:

import re
re.search( r"ba{,3}", "baaa", flags=re.DOTALL | re.MULTILINE )  # returns 'baaa' as expected

# Compile the same pattern and it won't return any matches on the above search string.
db = hyperscan.Database( mode=hyperscan.HS_MODE_STREAM )
db.compile(
   expressions=[ b"ba{,3}" ],
   ids=[ 1234 ],
   flags=[ hyperscan.HS_FLAG_MULTILINE | hyperscan.HS_FLAG_DOTALL ],
   elements=1,
)
  • In the above example, hyperscan will return a match if the pattern is ba{0,3} (include the zero).
  • If the searched string is r"aaaba{,3}aaa", Hyperscan will match on ba{,3}. It doesn't seem to recognise the pattern as regex repeat syntax.
  • Other repeats like{1,} work correctly.

Hyperscan version: 5.6.1
Python hyperscan version: 0.4.0
Python version: 3.9.17

dchenz commented

Found Bounded repeat qualifiers such as {n}, {m,n}, {n,} are supported with limitations in the docs, so I think it's expected unsupported behavior.