ronsavage/Regexp-Assemble

Incorrect processing when lookaheads are enabled

Opened this issue · 3 comments

There are 2 use cases I've identified so far; hopefully this should be enough info for you to reproduce.

#!/usr/bin/env perl

use Regexp::Assemble;

sub err {
    my @parts = @_;
    my $re = Regexp::Assemble->new(lookahead => 1);
    $re->add($_.'\d*') for @parts;
    my $r = $re->re();
    print $re->as_string() . "\n";
    print "  Didn't match: $_\n\n" for grep {$_ !~ $r} @parts;
}



# The 0 never makes it into the lookahead probably because it's falsey?
err qw/
    a0b
    a1
/;



# everything after the single b should be optional but it isn't
# 
# if you remove the 'ab' or the trailing '\d*' in the err func this works as expected
err qw/
    ab
    b
    b1
    bc
    bc1
/;

Output:

a(?=1)(?:0b|1)\d*
  Didn't match: a0b

(?=[ab])(?:b(?=[1c])(?:c)?(?:1)?|ab)\d*
  Didn't match: b

I'm using Regexp::Assemble 0.36 & here's my perl build info:

perl -V
Summary of my perl5 (revision 5 version 22 subversion 0) configuration:

  Platform:
    osname=darwin, osvers=14.5.0, archname=darwin-thread-multi-ld-2level
    uname='darwin 34363bc7dc9c 14.5.0 darwin kernel version 14.5.0: wed jul 29 02:26:53 pdt 2015; root:xnu-2782.40.9~1release_x86_64 x86_64 i386 macbookpro11,3 darwin '
    config_args='-de -Dprefix=/Users/{USERID}/perl5/perlbrew/perls/perl-5.22.0 -Dcc=clang -Duse64bitint -Duse64bitall -Duselongdouble -Dusethreads -Dusemultiplicity -Aeval:scriptdir=/Users/{USERID}/perl5/perlbrew/perls/perl-5.22.0/bin'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    use64bitint=define, use64bitall=define, uselongdouble=define
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='clang', ccflags ='-fno-common -DPERL_DARWIN -no-cpp-precomp -arch x86_64 -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include',
    optimize='-O3',
    cppflags='-no-cpp-precomp -arch x86_64 -fno-common -DPERL_DARWIN -no-cpp-precomp -arch x86_64 -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
    ccversion='', gccversion='4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678, doublekind=3
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16, longdblkind=3
    ivtype='long', ivsize=8, nvtype='long double', nvsize=16, Off_t='off_t', lseeksize=8
    alignbytes=16, prototype=define
  Linker and Libraries:
    ld='env MACOSX_DEPLOYMENT_TARGET=10.3 clang -arch x86_64', ldflags =' -arch x86_64 -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib /usr/lib
    libs=-lpthread -lgdbm -ldbm -ldl -lm -lutil -lc
    perllibs=-lpthread -ldl -lm -lutil -lc
    libc=, so=dylib, useshrplib=false, libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags=' -bundle -undefined dynamic_lookup -L/usr/local/lib -fstack-protector-strong'


Characteristics of this binary (from libperl):
  Compile-time options: HAS_TIMES MULTIPLICITY PERLIO_LAYERS
                        PERL_DONT_CREATE_GVSV
                        PERL_HASH_FUNC_ONE_AT_A_TIME_HARD
                        PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP
                        PERL_NEW_COPY_ON_WRITE PERL_PRESERVE_IVUV
                        USE_64_BIT_ALL USE_64_BIT_INT USE_ITHREADS
                        USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE
                        USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_LOCALE_TIME
                        USE_LONG_DOUBLE USE_PERLIO USE_PERL_ATOF
                        USE_REENTRANT_API
  Locally applied patches:
    Devel::PatchPerl 1.32
  Built under darwin
  Compiled at Sep  8 2015 14:44:27
  %ENV:
    PERLBREW_BASHRC_VERSION="0.73"
    PERLBREW_HOME="/Users/{USERID}/.perlbrew"
    PERLBREW_MANPATH="/Users/{USERID}/perl5/perlbrew/perls/perl-5.22.0/man"
    PERLBREW_PATH="/Users/{USERID}/perl5/perlbrew/bin:/Users/{USERID}/perl5/perlbrew/perls/perl-5.22.0/bin"
    PERLBREW_PERL="perl-5.22.0"
    PERLBREW_ROOT="/Users/{USERID}/perl5/perlbrew"
    PERLBREW_VERSION="0.73"
  @INC:
    /Users/{USERID}/perl5/perlbrew/perls/perl-5.22.0/lib/site_perl/5.22.0/darwin-thread-multi-ld-2level
    /Users/{USERID}/perl5/perlbrew/perls/perl-5.22.0/lib/site_perl/5.22.0
    /Users/{USERID}/perl5/perlbrew/perls/perl-5.22.0/lib/5.22.0/darwin-thread-multi-ld-2level
    /Users/{USERID}/perl5/perlbrew/perls/perl-5.22.0/lib/5.22.0
    .

I had a quick look at the code, and cannot see what to patch. I suspect I could spend a great deal of time studying this code before daring to patch anything.
As for your particular problem, note what the TODO says: 24. Lookahead assertions contain serious bugs....
My plan is at Xmas to write a Marpa-based parse of regexps, which outputs a tree. From that it may be possible to replicate some of the features of this module. I did not study the problem yet, but feel that will work.
I'm sorry not to be able to offer anything more that that.

Np I only came across this because I was just curious if I would get speedups from the lookaheads since the generated regex is like ~800k chars long haha (without lookaheads). Then when I was testing I noticed I was getting different outputs so I looked into it and just figured I would submit a bug report.

Thanks for looking into it. Not sure if you want this bug to stay open for future validation or whatever... you can just close it if you want.

Hi Adam

On 15/12/15 07:31, Adam Lesperance wrote:

Np I only came across this because I was just curious if I would get
speedups from the lookaheads since the generated regex is like ~800k
chars long haha (without lookaheads). Then when I was testing I noticed
I was getting different outputs so I looked into it and just figured I
would submit a bug report.

Ahhh. Good to know it's not a real inconvenience.

Thanks for looking into it. Not sure if you want this bug to stay open
for future validation or whatever... you can just close it if you want.

I'll keep it open. These are the sorts of things any new code I write
has to take into account.

And it's a warning to potential - and current - users of the module of
problems they will definitely encounter.

Ron Savage - savage.net.au