/regex-opt

Primary LanguageC++GNU General Public License v2.0GPL-2.0

Perl-compatible regular expression optimizer ## 0. Contents This is the documentation of regex-opt-1.2.2.

   1\. [Purpose](#h0)    2\. [Usage](#h1)    3\. [Supported syntax](#h2)    4\. [Unsupported syntax](#h3)    5\. [Optimizations performed](#h4)    6\. [Optimizations not performed](#h5)    7\. [Copying](#h6)    8\. [Requirements](#h7)    9\. [Downloading](#download)

1. Purpose

Optimizes perl-compatible regular expressions.

2. Usage

The general syntax for running the program is: `regex-opt ` Example: `regex-opt 'xaz|xbz|xcz' x[a-c]z` Try running regex-opt on [Abigail](http://www.foad.org/%7Eabigail/)'s 7 kilobyte [URL regexp](http://www.foad.org/~abigail/Perl/url3.regex). The result should be about 5 kilobytes long.

3. Supported syntax

* * (repeat 0-inf) * + (repeat 1-inf) * ? (repeat 0-1) * {n} (repeat n) * {n,} (repeat n-inf) * {n,m} (repeat n-m) * . (accept any char except \n) * [a-z] (character sets) * [^a-z] (inverse character sets) * [[:alpha:]] (character classes) * \s (and other character classes and escapes) * x|y (alternatives) * (?:x|y) (non-capturing grouping) * *? (non-greedy repeat)

4. Unsupported syntax

* ^ (match string-begin) * $ (match string-end) * () (capturing is converted to noncapturing) * Any (? -command that is not mentioned in supported syntax * Unicode-specific markup

5. Optimizations performed

* Character set optimization: [A-Zabcdefgh-yz0-9%] becomes [[:alnum:]%] * Alternate characters: y|[yp]|[zx] becomes [px-z] * Counting: aaa* and aa+ become a{2,} and (a?){3} becomes a{0,3} * Combining: abcde|xycde becomes (?:ab|xy)cde * Parenthesis reduction: ((abc)) becomes abc, (xx|yy)|zz becomes xx|yy|zz * Compression: xyzyzxyzyz becomes (?:x(?:yz){2}){2} * This might not be always a good thing. * Choice counting: a+|aa+ becomes a+, (b|) becomes b?, dxxxxb|dxxxb|dxxb|dxb becomes dx{1,4}b

6. Optimizations not performed

* Combining counts: * a?|b? should become (?:a|b)?, now becomes a?|b? * Redundancy removal (removal of alternatives that are subsets of other alternatives): * xfooy|x[a-q]+y should become x[a-q]+y, now becomes x(?:foo|[a-q]+)y Help in solving these shortcomings would be welcome.

7. Copying

regex-opt has been written by Joel Yliluoma, a.k.a. [Bisqwit](http://iki.fi/bisqwit/), and is distributed under the terms of the [General Public License](http://www.gnu.org/licenses/licenses.html#GPL) (GPL).

8. Requirements

For compiling you need the following GNU tools: g++, make.

9. Downloading

The official home page of regex-opt is at [http://iki.fi/bisqwit/source/regex-opt.html](http://iki.fi/bisqwit/source/regex-opt.html). Check there for new versions.

Generated from progdesc.php (last updated: Wed, 21 Feb 2007 17:27:15 +0200) with docmaker.php (last updated: Sun, 12 Jun 2005 06:08:02 +0300) at Tue, 27 Feb 2007 16:46:38 +0200