hyperpape/needle

Large regexes are poorly handled

Opened this issue · 2 comments

Figure out how to handle very large regexes--the strategy we're using generates very large class files.

a04107b substantially increased the size of generated regexes, making the problem worse, and required commenting out a previously used test case.

The problem isn't gone, but some recent work to enable use of byteclasses is probably helpful. The regex Holmes.{0,25}Watson|Watson.{0,25}Holmes now generates a 340 KB class, instead of 2MB.

Subsequent updates to encoding make Holmes.{0,25}Watson|Watson.{0,25}Holmes only 245181 bytes.