xp-framework/rfc

New text.regex package

Closed this issue · 0 comments

Scope of Change

A new package text.regex will be added.

Rationale

Object oriented API for regular expressions.

Functionality

The entry point class is text.regex.Pattern which is a wrapper around the
preg_*() functions in PHP.

Testing whether a pattern matches

The most common use-case is to test whether a given pattern matches.

<?php
  // Current
  if (preg_match('/([w]{3}\.)?example\.(com|net|org)/', $string)) {
    ...
  }

  // New
  if (Pattern::compile('([w]{3}\.)?example\.(com|net|org)')->matches($string)) {
    ...
  }
?>

The problem with the preg_match() approach is that it will return FALSE
if the pattern is malformed (and raise a warning) - this is something that
can lead to long debugging / wtf?! sessions. The Pattern class will throw
an exception.

Retrieving matched text

To match parts out of a string:

<?php
  // Current
  preg_match('/(([w]{3})\.)?example\.(com|net|org)/', $string, $matches);
  Console::writeLine($matches);

  // New
  $match= Pattern::compile('(([w]{3})\.)?example\.(com|net|org)')->match($string);
  Console::writeLine($match->group(0));
?>

The results in both cases is a string-array with the contents
[ "www.example.com", "www.", "www", "com" ].

Working with string objects

The text.regex pattern supports the lang.types.String object built-in:

<?php
  $string= new String('xp-framework/rfc #1');
  $num= Pattern::compile('RFC #([0-9]+)')->match($string)->group(1);  // "0001"
?>

Modifiers

Instead of embedding the modifiers in the pattern string, they need to be
passed to the Pattern class' compile() method as bitfield:

<?php
  // Current
  $ok= preg_match('/[a-z0-9_]+/i', $username);

  // New
  $ok= Pattern::compile('[a-z0-9_]+', Pattern::CASE_INSENSITIVE)->matches($username);
?>

Further modifiers are:

  Constant name    Modifier
  ================ ========
  CASE_INSENSITIVE i
  MULTILINE        m
  DOTALL           s
  EXTENDED         x
  ANCHORED         A
  DOLLAR_ENDONLY   D
  ANALYSIS         S
  UNGREEDY         U
  UTF8             u

This is more verbose but easier to read.

Security considerations

None.

Speed impact

Slightly slower than procedural approach.

Dependencies

PCRE extension (enabled by default)

Related documents