ronsavage/Regexp-Assemble

Regexps returned from ->source() aren't always eq to the ones that were ->add()ed

Opened this issue · 3 comments

I'm using the return values from ->source() to lookup in a hash which regexp was matched, however some of the returned regexps contain extra backslash characters that weren't present in the input regexps.

For example here I'm adding the regexp '^/hi\z' but the regexp returned from ->source() is ^\/hi\z (a backslash character was added before the /).

perl -MRegexp::Assemble -E 'my $ra = Regexp::Assemble->new->track(1); $ra->add(q{^/hi\z}); my $re = $ra->re; ("/hi" =~ $re) || die; say $ra->source($^R)'
^\/hi\z

For now I'm just hacking it by substituting / with \/ in my patterns before I ->add() them but I think the interface would be nicer if the returned regexps were always eq to the input regexps.

Another idea: users could keep track of the order they ->add regexps, and $^R could correspond to this order (possibly by a mapping function?). I thought this might be the case at first but it didn't seem to match up in my testing...

Maybe there is a simple way to solve this that I don't know about?

Thanks!

Doug

PS.

$ perl -E 'say $^V'
v5.18.2
$ cpanm Regexp::Assemble
Regexp::Assemble is up to date. (0.36)

Thanx for the report. As you probably know I didn't write this module. I just offered to take over maintenance in order to clean up the distro and release it. I've looked at the code before, in response to other requests, and do not believe I can make any effective patches to it. Sorry.

I am, however, just about to start a new module based (probably) on Marpa::R2, which will parse
regexps in a vastly cleaner way. Any such new module would necessarily not have the problem you refer to. Just for the record, my plan is to store the result of the parse in a tree, from which various outputs can be constructed.

In the short term, I recommend you persist with your work-around.

Hi Ron,

I'm really looking forward to checking out your module. I might try to use it in my module https://metacpan.org/pod/Bio::Regexp . I wrote a very simply Regexp::Grammars regexp parser for a small sub-set of regexps so that I could reverse them.

Having looked at the internals of Regexp::Assemble I very much sympathise with not wanting to/being able to work on it.

Yes, people do some... "innovative" things to parse regexps. Here's a funny/horrifying example:

https://metacpan.org/source/CJFIELDS/BioPerl-1.6.924/Bio/Tools/SeqPattern.pm#L489

The code wants to reverse a regexp so it textually reverses the regexp string and then tries to "fix it up" after with things like tr/[]/][/

:)

It's sad that people have to resort to such code (which I do not for a moment pretend to understand), but the fact that Perl makes it, if not easy, then doable, says volumes about Perl. But what it's saying about the issue the author has to fight, I dare not think :-).