/proposal-regexp-conditionals

Primary LanguageJavaScriptBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

Regular Expression Conditionals for ECMAScript

Status

Stage: 0
Champion: Ron Buckton (@rbuckton)

For detailed status of this proposal see TODO, below.

Authors

Motivations

From https://github.com/rbuckton/proposal-regexp-features:

ECMAScript regular expressions have slowly improved over the years to adopt new functionality commonly present in other languages, including:

  • Unicode Support
  • Named Capture Groups
  • Match Indices

However, a large majority of other languages and libraries have a common set of features that ECMAScript regular expressions currently lack. Some of these features improve performance in degenerative cases such as backtracking in complex patterns. Some of these features introduce new tools for developers to write more powerful regular expressions.

As a result, ECMAScript developers wishing to leverage these capabilities are left with few options, relying on native bindings to third-party libraries in environments such as NodeJS, or server-side evaluation.

There are numerous applications for extending the ECMAScript regular expression feature set, including:

  • In-browser support for TextMate grammars for web based editors/IDEs.
  • Improved performance for expressions through possessive quantifiers and backtracking control.
  • RegExp-based parsers that can support balanced brackets/parens.
  • Documenting complex patterns in the pattern itself.
  • Improved readability through the use of multi-line patterns and insignificant whitespace.

NOTE: See https://github.com/rbuckton/proposal-regexp-features for an overview of how this proposal fits into other possible future features for Regular Expressions.

Regular Expression Conditionals, as implemented in multiple other engines, provide increased flexibility with regards to complex matching in alternatives.

Prior Art

See https://rbuckton.github.io/regexp-features/features/conditional-expressions.html for additional information.

Syntax

A Conditional Expression checks a condition and evaluates its first alternative if the condition is true; otherwise, it evaluates its second alternative.

  • (?(condition)yes-pattern|no-pattern) — Matches yes-pattern if condition is true; otherwise, matches no-pattern.
  • (?(condition)yes-pattern) — Matches yes-pattern if condition is true; otherwise, matches the empty string.

NOTE: This has no conflicts with existing syntax, as ECMAScript currently produces an error for this syntax in both u and non-u modes.

Conditions

The following conditions are proposed:

  • (?=test-pattern) — Evaluates to true if a positive lookahead for test-pattern matches; Otherwise, evaluates to false.
  • (?<=test-pattern) — Evaluates to true if a positive lookbehind for test-pattern matches; Otherwise, evaluates to false.
  • (?!test-pattern) — Evaluates to true if a negative lookahead for test-pattern matches; Otherwise, evaluates to false.
  • (?<!test-pattern) — Evaluates to true if a negative lookbehind for test-pattern matches; Otherwise, evaluates to false.
  • (n) — Evaluates to true if the capture group at offset n was successfully matched; Otherwise, evaluates to false.
  • (<name>) — Evaluates to true if the named capture group with the provided name was successfully matched; Otherwise, evaluates to false.

The following conditions are out of scope but may be considered in a future proposal:

  • (DEFINE) — Always evaluates to false. This allows you to define Subroutines.
  • (R) — Evaluates to true if inside a recursive expression; Otherwise, evaluates to false.
  • (Rn) — Evaluates to true if inside a recursive expression for the capture group at offset n; Otherwise, evaluates to false.
  • (R&name) — Evaluates to true if inside a recursive expression for the named capture group with the provided name; Otherwise, evaluates to false.

Examples

// conditional using lookahead:
const re1 = /^(?(?=\{)\{[0-9a-f]+\}|[0-9a-f]{4})$/
re1.test("0000"); // true
re1.test("{0}"); // true
re1.test("{00000000}"); // true

// match optional brackets
const re2 = /(?<open-bracket>\[)?(?<content>[^\]]+)(?(<open-bracket>)\]))/;
re1.test("abc"); // true
re1.test("[abc]"); // true
re1.test("[abc"); // false

History

  • October 27, 2021 — Proposed for Stage 1 (slides)
    • Outcome: Remained at Stage 0

TODO

The following is a high-level list of tasks to progress through each stage of the TC39 proposal process:

Stage 1 Entrance Criteria

  • Identified a "champion" who will advance the addition.
  • Prose outlining the problem or need and the general shape of a solution.
  • Illustrative examples of usage.
  • High-level API.

Stage 2 Entrance Criteria

Stage 3 Entrance Criteria

Stage 4 Entrance Criteria

  • Test262 acceptance tests have been written for mainline usage scenarios and merged.
  • Two compatible implementations which pass the acceptance tests: [1], [2].
  • A pull request has been sent to tc39/ecma262 with the integrated spec text.
  • The ECMAScript editor has signed off on the pull request.