ECMAScript RegExp
match array offsets provide additional information about the start and end
position of a captured substring.
An example implementation can be found in regexp-measure
.
NOTE:
regexp-measure
was built around the Stage 0 proposal and is no longer up to date with respect to the current proposed API design.
Stage: 2 Champion: Ron Buckton (@rbuckton)
For detailed status of this proposal see TODO, below.
- Ron Buckton (@rbuckton)
Today, ECMAScript RegExp
objects can provide information about a match when calling the exec
method. This result is an Array
containing information about the substrings that were matched,
along with additional properties to indicate the input
string, the index
in the input at which
the match was found, as well as a groups
object containing the substrings for any named capture
groups.
However, there are several more advanced scenarios where this information may not necessarily be
sufficient. For example, an ECMAScript implementation of TextMate Language syntax highlighting
needs more than just the index
of the match, but also the offsets for individual capture
groups.
As such, we propose the adoption of an additional indices
property on the array result (the
substrings array) of RegExp.prototype.exec()
. This property would itself be an indices array
containing a pair of start and end indices for each captured substring. Any unmatched capture
groups would be undefined
, similar to their corresponding element in the substrings array.
In addition, the indices array would itself have a groups
property containing the start and end
indices for each named capture group.
- Oniguruma NodeJS bindings:
captureIndices
property - .NET:
Capture.Index
Property - Java:
Matcher.start(int)
Method
const re1 = /a*(?<Z>z)?/;
// indices are relative to start of the input string:
const s1 = "xaaaz";
const m1 = re1.exec(s1);
m1.indices[0][0] === 1;
m1.indices[0][1] === 5;
s1.slice(...m1.indices[0]) === "aaaz";
m1.indices[1][0] === 4;
m1.indices[1][1] === 5;
s1.slice(...m1.indices[1]) === "z";
m1.indices.groups["Z"][0] === 4;
m1.indices.groups["Z"][1] === 5;
s1.slice(...m1.indices.groups["Z"]) === "z";
// capture groups that are not matched return `undefined`:
const m2 = re1.exec("xaaay");
m2.indices[1] === undefined;
m2.indices.groups["Z"] === undefined;
The following is a high-level list of tasks to progress through each stage of the TC39 proposal process:
- Identified a "champion" who will advance the addition.
- Prose outlining the problem or need and the general shape of a solution.
- Illustrative examples of usage.
- High-level API.
- Initial specification text.
-
Transpiler support (Optional).
- Complete specification text.
- Designated reviewers have signed off on the current spec text.
- The ECMAScript editor has signed off on the current spec text.
- Test262 acceptance tests have been written for mainline usage scenarios and merged.
- Two compatible implementations which pass the acceptance tests: [1], [2].
- A pull request has been sent to tc39/ecma262 with the integrated spec text.
- The ECMAScript editor has signed off on the pull request.