mudge/re2

Benchmarking

igrigorik opened this issue · 2 comments

Paul, this is not a bug, as much of a question.. Tried wrapping re2 into a simple benchmark, not seeing much difference between the different implementations:

http://gist.github.com/502400

ruby 1.8.7 (2009-06-12 patchlevel 174) [i686-darwin10]

Q: Could be a contrived example on my part, are there any specific edge cases where RE2 really shines in your experience?

Hi Ilya,

At the moment, you are correct: re2 does not seriously outperform Ruby's native Regexp library but I believe this to be due to my implementation (which is currently rather fast and loose with creating strings) rather than a problem inherent to the RE2 library itself.

For further reading, see Russ Cox's Regular Expression Matching in the Wild, particularly the Performance section which discusses RE2's strengths: "RE2 is competitive with PCRE on small searches and faster on large ones."

Performance is something I want to address but at the moment I am just finalising the interface (which is currently looking to replicate Ruby's own Regexp and MatchData classes).

I will leave this ticket open for the time being though until we get some decent benchmarks in (perhaps a port of RE2's own http://code.google.com/p/re2/source/browse/re2/testing/regexp_benchmark.cc).

Awesome, thanks Paul. Will monitor this ticket for any updates - have a few applications that would benefit greatly from a speedup in the regex evaluation section.