mudge/re2

RE2: invalid startpos, endpos pair

sk- opened this issue · 4 comments

sk- commented

When running a script that parses lot of data, I'm getting a couple of the following errors:

re2/re2.cc:562: RE2: invalid startpos, endpos pair. [startpos: 0, endpos: 36, text size: 34]
re2/re2.cc:562: RE2: invalid startpos, endpos pair. [startpos: 0, endpos: 51, text size: 49]
re2/re2.cc:562: RE2: invalid startpos, endpos pair. [startpos: 0, endpos: 51, text size: 50]

unfortunately I don't know which one is the offending string/pattern.

mudge commented

Is it possible to provide a minimal failing test case?

This seems to be coming from the underlying re2 library itself but it'd be useful to have the regular expression and some text to test with.

sk- commented

Unfortunately, I got these errors when running against a huge dataset, so it's not easy to know which patterns generated them.

The problem seems to be that somehow the string gets reencoded when passed to re2, and in that process the length of the string changes.

mudge commented

Can you at least share the code you're using (including the regular expression) so I can see if this is coming from match or consume, etc.?

Re your test data, is there anything I should know about its encoding, etc?

mudge commented

Without any further information (e.g. which classes and methods you're using or the regular expression), I'm not going to be able to investigate this as I'm not sure where to start.