/tyrex

A typical expressions parser for Haskell

Primary LanguageHaskellMIT LicenseMIT

Tyrex

A non-deterministic typical expressions parser for Haskell language. A typical expression is a type-safe regular expression.

Other proposed names include: typex (typical expressions) and texen (typical expressions engine).

Usage

import Typical

gives you many functions that correspond to standard regular expressions operators. To parse a string use a match :: [Pattern] -> [String]. For example:

  • match [_digit, _char '+', _digit] "4+5"
  • match [_digit `_or` _alpha] "a1"
  • match [_digit, _optional . _seq $ [ _char '(', _word "one", _char ')' ] ] "one"

The result will be an array of strings that represents a list of possible matches. Match takes an array of patterns for convenience. Internally, it converts the array into the Sequence structure.

Data Types

The most important data type is called Pattern. It is recursive and has a number of constructors. By applying it to the match function you get a list of possible matches.

Positioning Operators

  • _seq :: [Pattern] -> Pattern - converts an array of patterns into a sequence pattern.
  • _or :: Pattern -> Pattern -> Pattern - converts two patterns into a disjunction.

Qualifier Operators

  • _char :: Char -> Pattern - converts a character into a pattern that matches it.
  • _digit :: Pattern - matches any digit.
  • _alpha :: Pattern - matches one upper or lower case letter.
  • _lower :: Pattern - matches a lower case letter.
  • _upper :: Pattern - matches upper case letter.
  • _anything :: Pattern - matches any character.
  • _oneOf :: [Char] -> Pattern - matches any one of the characters in the passed array.
  • _whitespace = _oneOf " \t\n\r"
  • _real :: Pattern - matches a positive or negative real number.

Quantifier Operators

  • _exactly :: Num -> Pattern -> Pattern - matches a passed pattern exactly n times or fails.
  • _between :: Num -> Num -> Pattern - greedily matches a pattern between min and max times.
  • _one = _exactly 1
  • _some :: Pattern -> Pattern - greedily matches a pattern at least once.
  • _any :: Pattern -> Pattern - greedily matches a pattern zero or more times.
  • _optional :: Pattern -> Pattern - greedily matches zero or one times.
  • _not :: Pattern -> Pattern - matches 0 times or fails.

Work in Progress

Plans to convert this to a monadic parser are on the way. This parser still lacks important features, like ^ and $. A non-deterministic lexer is also included in the Lexical package.