ikrabbe/plan9front

regexps: basically sound

GoogleCodeExporter opened this issue · 4 comments

sam, acme, and awk have bugs in their regexp implementations. i did not find 
any bugs in grep or sed.

= escaping chars within [] ==========

in plan 9 regexps, to enter a literal '-' '[' or ']' into a [] requires you 
escape them with '\'. this doesn't work in acme, sam, or awk. for instance, the 
following will match '+' but not '-': [+\-]

[\[\]] was also tested; it works fine in grep and sed but not acme sam or awk.

note: in grep, '-' as the first character of a [] is treated as a literal '-', 
just like gnu grep and perl. in acme it is not. regexp(6) does not document 
this.

= the + operator ==========

the + operator usually doesn't work. i nearly always have to replace 'x+' with 
'xx*' in acme, which gets ridiculous for non-trivial expressions. 

on comparing with sam (actually ssam), i found an expression which worked in 
acme (rare!) but did not work in sam. i didn't record what this expression was.

grep and sed appear to handle + correctly, although i only tested with trivial 
expressions.

curiously, back in 1998 i also found + to be broken in vim and gnu grep; it 
worked in emacs but nothing else i used at the time. perhaps the gnu project 
followed a broken example implementation from bell labs.

= misc notes ==========

according to chatter in #cat-v, sam doesn't use libregexp. acme includes 
regexp.h but compiles its own regexps anyway.

i'm not entirely sure which awk i used, pap's or ape awk. burnzez reports pap's 
awk uses libregexp, but he didn't look into ape awk.

Original issue reported on code.google.com by tereniao...@gmail.com on 31 Mar 2014 at 3:17

quoting in [] fixed in r6b4e19cd75b1

Original comment by cinap_le...@felloff.net on 1 Apr 2014 at 4:08

you do not give any examples for + operator failing and i cannot reproduce it.

Original comment by cinap_le...@felloff.net on 1 Apr 2014 at 4:10

awk character class literal quoting works just fine:

term% {echo +; echo -} | awk '/[+\-]/{print $0}'
+
-

Original comment by cinap_le...@felloff.net on 1 Apr 2014 at 4:16

thanks for the fixes.

i can't reproduce my awk issue. i tried both your example and one with the 
regexp in an argument to gsub, and with [\-+]; it worked every way.

my complaints about + can wait until i have examples. closing this now.

Original comment by tereniao...@gmail.com on 19 Apr 2014 at 11:35

  • Changed state: Fixed