darvid/python-hyperscan

Can I get scan results in a non-callback manner?

SergeyPiskunov opened this issue · 3 comments

Hi! I'm curious, is there any ability to get scan results either synchronously without passing a callback or as "awaitiable" object of an asyncio event loop?

Bear in mind that when db.scan returns in Python, all the callbacks for any potential matches are guaranteed to have been invoked. So you can make calls to persist the results somewhere (global state, database, future, whatever) and just pull them immediately after scanning.

With that said, I think using context to store results is useful in this scenario, assuming you don't have a huge amount of patterns or expected matches. For example:

import collections
import typing

import hyperscan

HsPattern = collections.namedtuple('HsPattern', ['pattern', 'id', 'flags'])
HsResult = collections.namedtuple('HsResult', ['id', 'start', 'end', 'flags'])

PATTERNS = (
    HsPattern(br'fo+', 0, 0),
    HsPattern(br'^foobar$', 1, hyperscan.HS_FLAG_CASELESS),
    HsPattern(
        br'BAR',
        2,
        hyperscan.HS_FLAG_CASELESS | hyperscan.HS_FLAG_SOM_LEFTMOST,
    ),
)


def on_match(
    id: int,
    start: int,
    end: int,
    flags: int,
    context: typing.Optional[typing.Any] = None,
) -> typing.Optional[bool]:
    context['results'].append(HsResult(id, start, end, flags))
    return 0


def create_database(patterns: typing.Tuple[HsPattern]) -> hyperscan.Database:
    db = hyperscan.Database()

    expressions, ids, flags = zip(*patterns)
    db.compile(
        expressions=expressions, ids=ids, elements=len(patterns), flags=flags
    )
    return db


def main() -> None:
    db = create_database(PATTERNS)
    context = {'results': []}
    db.scan(b'foobar', match_event_handler=on_match, context=context)
    for result in context['results']:
        print(result)


if __name__ == '__main__':
    main()
$ python context_results.py
HsResult(id=0, start=0, end=2, flags=0)
HsResult(id=0, start=0, end=3, flags=0)
HsResult(id=2, start=3, end=6, flags=0)
HsResult(id=1, start=0, end=6, flags=0)

Yeah.. Seems that passing "context" will be sufficient for me. Thanks a lot!

@darvid is there anyway to return the matched substring?