/gplex

GPLEX is a scanner generator which produces lexical scanners written in C# V2 or higher. The input language is similar to the original LEX specification language, but allows full 21-bit Unicode scanners to be specified.

Primary LanguageC#OtherNOASSERTION

Project Description

GPLEX is a scanner generator which produces lexical scanners written in C# V2 or higher. The input language is similar to the original LEX specification language, but allows full 21-bit Unicode scanners to be specified.

This repository now includes the full documentation for the scanner-generator.

Features

GPLEX generates scanners based around finite state automata. The generated automata have the number of states minimized by default, and have a large number of options for table compression. The default compression scheme is chosen depending on the input alphabet cardinality, and almost always gives a reasonable result. However a large number of options are available for the user to tune the behavior if necessary.

The tool implements many of the FLEX extensions, including such things as start-state stacks.

The generated scanners are designed to interface cleanly with bottom-up parsers generated by Gardens Point Parser Generator. However, gplex-generated scanners have been successfully used with both handwritten parsers and with parsers generated by COCO/R.

Examples Of Use

There are a small number of examples of use included in the download package, and these are fully discussed in the documentation. For a more complex example GPLEX and the companion GPPG tool each themselves use scanners and parsers generated by GPLEX and GPPG. The examples described in the documentation for GPPG and GPLEX have now been added to the distribution as file GP-Examples.zip.

There is a separate documentation file that deals with the special issues that arise with scanners that use the Unicode character set.

Is GPLEX What You Need?

GPLEX is a scanner generator. It is intended to be used to generate scanners for compilers or other tools that process text. It picks out non-overlapping substrings from within a continuing input stream, and returns an integer token identification. It may also be used for other simple regular expression recognition tasks, but is not a replacement for the System.Text.RegularExpressions classes. It does not have built-in mechanisms for multiple substring capture or anything similar.

GPLEX has historically had an approximately 2-per-year release cycle. If there is some feature that fits within the broad intention of the tool and which you feel is missing ... raise an issue. If what you really want is a C# version of AWK then GPLEX isn't it, and the copyright notice explains the conditions under which you may use code of GPLEX to help you implement AWK.NET yourself.