Incorrect processing when lookaheads are enabled
Opened this issue · 3 comments
There are 2 use cases I've identified so far; hopefully this should be enough info for you to reproduce.
#!/usr/bin/env perl
use Regexp::Assemble;
sub err {
my @parts = @_;
my $re = Regexp::Assemble->new(lookahead => 1);
$re->add($_.'\d*') for @parts;
my $r = $re->re();
print $re->as_string() . "\n";
print " Didn't match: $_\n\n" for grep {$_ !~ $r} @parts;
}
# The 0 never makes it into the lookahead probably because it's falsey?
err qw/
a0b
a1
/;
# everything after the single b should be optional but it isn't
#
# if you remove the 'ab' or the trailing '\d*' in the err func this works as expected
err qw/
ab
b
b1
bc
bc1
/;
Output:
a(?=1)(?:0b|1)\d*
Didn't match: a0b
(?=[ab])(?:b(?=[1c])(?:c)?(?:1)?|ab)\d*
Didn't match: b
I'm using Regexp::Assemble 0.36
& here's my perl build info:
perl -V
Summary of my perl5 (revision 5 version 22 subversion 0) configuration:
Platform:
osname=darwin, osvers=14.5.0, archname=darwin-thread-multi-ld-2level
uname='darwin 34363bc7dc9c 14.5.0 darwin kernel version 14.5.0: wed jul 29 02:26:53 pdt 2015; root:xnu-2782.40.9~1release_x86_64 x86_64 i386 macbookpro11,3 darwin '
config_args='-de -Dprefix=/Users/{USERID}/perl5/perlbrew/perls/perl-5.22.0 -Dcc=clang -Duse64bitint -Duse64bitall -Duselongdouble -Dusethreads -Dusemultiplicity -Aeval:scriptdir=/Users/{USERID}/perl5/perlbrew/perls/perl-5.22.0/bin'
hint=recommended, useposix=true, d_sigaction=define
useithreads=define, usemultiplicity=define
use64bitint=define, use64bitall=define, uselongdouble=define
usemymalloc=n, bincompat5005=undef
Compiler:
cc='clang', ccflags ='-fno-common -DPERL_DARWIN -no-cpp-precomp -arch x86_64 -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include',
optimize='-O3',
cppflags='-no-cpp-precomp -arch x86_64 -fno-common -DPERL_DARWIN -no-cpp-precomp -arch x86_64 -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
ccversion='', gccversion='4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)', gccosandvers=''
intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678, doublekind=3
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16, longdblkind=3
ivtype='long', ivsize=8, nvtype='long double', nvsize=16, Off_t='off_t', lseeksize=8
alignbytes=16, prototype=define
Linker and Libraries:
ld='env MACOSX_DEPLOYMENT_TARGET=10.3 clang -arch x86_64', ldflags =' -arch x86_64 -fstack-protector-strong -L/usr/local/lib'
libpth=/usr/local/lib /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib /usr/lib
libs=-lpthread -lgdbm -ldbm -ldl -lm -lutil -lc
perllibs=-lpthread -ldl -lm -lutil -lc
libc=, so=dylib, useshrplib=false, libperl=libperl.a
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
cccdlflags=' ', lddlflags=' -bundle -undefined dynamic_lookup -L/usr/local/lib -fstack-protector-strong'
Characteristics of this binary (from libperl):
Compile-time options: HAS_TIMES MULTIPLICITY PERLIO_LAYERS
PERL_DONT_CREATE_GVSV
PERL_HASH_FUNC_ONE_AT_A_TIME_HARD
PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP
PERL_NEW_COPY_ON_WRITE PERL_PRESERVE_IVUV
USE_64_BIT_ALL USE_64_BIT_INT USE_ITHREADS
USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE
USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_LOCALE_TIME
USE_LONG_DOUBLE USE_PERLIO USE_PERL_ATOF
USE_REENTRANT_API
Locally applied patches:
Devel::PatchPerl 1.32
Built under darwin
Compiled at Sep 8 2015 14:44:27
%ENV:
PERLBREW_BASHRC_VERSION="0.73"
PERLBREW_HOME="/Users/{USERID}/.perlbrew"
PERLBREW_MANPATH="/Users/{USERID}/perl5/perlbrew/perls/perl-5.22.0/man"
PERLBREW_PATH="/Users/{USERID}/perl5/perlbrew/bin:/Users/{USERID}/perl5/perlbrew/perls/perl-5.22.0/bin"
PERLBREW_PERL="perl-5.22.0"
PERLBREW_ROOT="/Users/{USERID}/perl5/perlbrew"
PERLBREW_VERSION="0.73"
@INC:
/Users/{USERID}/perl5/perlbrew/perls/perl-5.22.0/lib/site_perl/5.22.0/darwin-thread-multi-ld-2level
/Users/{USERID}/perl5/perlbrew/perls/perl-5.22.0/lib/site_perl/5.22.0
/Users/{USERID}/perl5/perlbrew/perls/perl-5.22.0/lib/5.22.0/darwin-thread-multi-ld-2level
/Users/{USERID}/perl5/perlbrew/perls/perl-5.22.0/lib/5.22.0
.
I had a quick look at the code, and cannot see what to patch. I suspect I could spend a great deal of time studying this code before daring to patch anything.
As for your particular problem, note what the TODO says: 24. Lookahead assertions contain serious bugs....
My plan is at Xmas to write a Marpa-based parse of regexps, which outputs a tree. From that it may be possible to replicate some of the features of this module. I did not study the problem yet, but feel that will work.
I'm sorry not to be able to offer anything more that that.
Np I only came across this because I was just curious if I would get speedups from the lookaheads since the generated regex is like ~800k chars long haha (without lookaheads). Then when I was testing I noticed I was getting different outputs so I looked into it and just figured I would submit a bug report.
Thanks for looking into it. Not sure if you want this bug to stay open for future validation or whatever... you can just close it if you want.
Hi Adam
On 15/12/15 07:31, Adam Lesperance wrote:
Np I only came across this because I was just curious if I would get
speedups from the lookaheads since the generated regex is like ~800k
chars long haha (without lookaheads). Then when I was testing I noticed
I was getting different outputs so I looked into it and just figured I
would submit a bug report.
Ahhh. Good to know it's not a real inconvenience.
Thanks for looking into it. Not sure if you want this bug to stay open
for future validation or whatever... you can just close it if you want.
I'll keep it open. These are the sorts of things any new code I write
has to take into account.
And it's a warning to potential - and current - users of the module of
problems they will definitely encounter.
Ron Savage - savage.net.au