regexps: basically sound
GoogleCodeExporter opened this issue · 4 comments
GoogleCodeExporter commented
sam, acme, and awk have bugs in their regexp implementations. i did not find
any bugs in grep or sed.
= escaping chars within [] ==========
in plan 9 regexps, to enter a literal '-' '[' or ']' into a [] requires you
escape them with '\'. this doesn't work in acme, sam, or awk. for instance, the
following will match '+' but not '-': [+\-]
[\[\]] was also tested; it works fine in grep and sed but not acme sam or awk.
note: in grep, '-' as the first character of a [] is treated as a literal '-',
just like gnu grep and perl. in acme it is not. regexp(6) does not document
this.
= the + operator ==========
the + operator usually doesn't work. i nearly always have to replace 'x+' with
'xx*' in acme, which gets ridiculous for non-trivial expressions.
on comparing with sam (actually ssam), i found an expression which worked in
acme (rare!) but did not work in sam. i didn't record what this expression was.
grep and sed appear to handle + correctly, although i only tested with trivial
expressions.
curiously, back in 1998 i also found + to be broken in vim and gnu grep; it
worked in emacs but nothing else i used at the time. perhaps the gnu project
followed a broken example implementation from bell labs.
= misc notes ==========
according to chatter in #cat-v, sam doesn't use libregexp. acme includes
regexp.h but compiles its own regexps anyway.
i'm not entirely sure which awk i used, pap's or ape awk. burnzez reports pap's
awk uses libregexp, but he didn't look into ape awk.
Original issue reported on code.google.com by tereniao...@gmail.com
on 31 Mar 2014 at 3:17
GoogleCodeExporter commented
quoting in [] fixed in r6b4e19cd75b1
Original comment by cinap_le...@felloff.net
on 1 Apr 2014 at 4:08
GoogleCodeExporter commented
you do not give any examples for + operator failing and i cannot reproduce it.
Original comment by cinap_le...@felloff.net
on 1 Apr 2014 at 4:10
GoogleCodeExporter commented
awk character class literal quoting works just fine:
term% {echo +; echo -} | awk '/[+\-]/{print $0}'
+
-
Original comment by cinap_le...@felloff.net
on 1 Apr 2014 at 4:16
GoogleCodeExporter commented
thanks for the fixes.
i can't reproduce my awk issue. i tried both your example and one with the
regexp in an argument to gsub, and with [\-+]; it worked every way.
my complaints about + can wait until i have examples. closing this now.
Original comment by tereniao...@gmail.com
on 19 Apr 2014 at 11:35
- Changed state: Fixed