slevithan/xregexp

Add syntax for subpatterns as subroutines

Closed this issue · 2 comments

This pseudo-AST structure described in #179 could also be the foundation of a useful advanced feature from PCRE, Perl, etc.: The ability to reference the entire contents of a named or numbered group (including nested parens) from later in the pattern, enabling support for subpattern reuse via (?&name) and (?n).

This would simply require generic syntax tokens for ) and any ( that isn't part of a self-contained token like (?#...) to mark subsequent tokens as children until the closing ) arrives. Then the generated pattern contents of each named group could be derived when needed.

Perhaps this would look like:

[
  {
    type: 'named-capture-start',
    name: 'name',
    output: '(',
    children: [
      {type: 'x-ignored', output: ''},
      {type: 'native-token', output: '.'},
    ],
  },
  {type: 'native-token', output: ')'},
]

Notes:

  • An error would need to be thrown if the group name referenced by (?&name) or group number with (?n) was not yet closed.
  • Make sure to handle things like (?<$1>.)(?<$2>(?&$1))(?&$2).
  • Some of the use cases are already handled by XRegExp.build and XRegExp.tag, but this would still be cleaner and or more robust in some cases, and the foundation created for it would make potential future XRegExp syntax addons more powerful.

This would also enable (?<DEFINE>(?<name1>...)(?<name2>...)) blocks that make subpattern reuse via (?&name) and (?n) more robust.

Subroutines and subroutine definition groups have been fully/robustly supported for some time in Regex+, the spiritual successor to XRegExp. 😊