regex sticky?
getify opened this issue · 9 comments
I'm currently trying to figure out what the heck the y
regex sticky mode actually does (lots of varied examples that conflict, and mis-information), but I was surprised to see it's not listed here at all. Plans to cover it?
If I get to some sort of complete understanding of it soon, I'll submit a PR. But perhaps you might already have a good way to explain it that would clear up the confusion?
the sticky
flag is more of a regex feature than es6, as it's supported elsewhere - however, MDN has a really concise example here, that explains the functionality well:
var text = 'First line\nsecond line';
var regex = /(\S+) line\n?/y;
var match = regex.exec(text);
console.log(match[1]); // prints 'First'
console.log(regex.lastIndex); // prints '11'
var match2 = regex.exec(text);
console.log(match2[1]); // prints 'Second'
console.log(regex.lastIndex); // prints '22'
var match3 = regex.exec(text);
console.log(match3 === null); // prints 'true'
From the same page, more clarification (bolded the part that should clear up an important use case for this flag):
The "y" flag indicates that it matches only from the index indicated by the lastIndex property of this regular expression in the target string (and does not attempt to match from any later indexes). This allows the match-only-at-start capabilities of the character "^" to effectively be used at any location in a string by changing the value of the lastIndex property.
@getify FWIW, last year I documented how y
works here: https://docs.webplatform.org/wiki/javascript/RegExp/sticky
You’re right that there is a lot of misinformation about this feature. Firefox/SpiderMonkey shipping a broken implementation and not fixing it for a long time (and the description on MDN matching that brokenness) didn’t really help.
Also check the V8 tests for this feature.
the sticky flag is more of a regex feature than es6, as it's supported elsewhere
Turns out there's a fair bit of very ES6 specific stuff to support sticky in a very particular way. Seems definitely like an ES6 feature to call out.
MDN has a really concise example here, that explains the functionality well
FWIW, I found that example (and others like it) to be really terrible at explaining what I needed to know about y
sticky. In fact, it quite mislead me initially.
It shows only the happy path (where subsequent matches are already adjacent by virtue of cooperation between known string contents and the way the pattern is structured), shows no failure or reset of index cases, and also uses only exec(..)
, instead of pointing out that this behavior extends to other regex-aware utilities, such as String#match(..)
.
Absolutely fair! My (incorrect) assumption was that the y
behavior was consistent across regex engines, this issue seems much more warranted now.
last year I documented how y works here:
Your examples are definitely clearer. Appreciate that.
They don't explicitly illustrate how a match failure resets lastIndex
back to 0
, which I think is an important detail. Also, they don't explain what happens if a ^
anchor is in the pattern (in both absolute and conditional positions). That particular thing seems to be a point of disagreement between older Firefox's initial invention of y
and the codified ES6 y
.
[Edit]: Also, I would mention that the fact that your example strings ('..foo*bar'
for example) look like regexes themselves was confusing to me.
This allows the match-only-at-start capabilities of the character "^" to effectively be used at any location in a string by changing the value of the lastIndex property.
Actually, I think that's the old FF behavior, it appears ES6's y
does not allow that. Case in point of misinformation and confusion.
Finally found spec text that perfectly clarifies ^
with y
. From 21.2.2.6:
Note: Even when the y flag is used with a pattern, ^ always matches only at the beginning of Input, or (if Multiline is true) at the beginning of a line.
So, /^b../y
will only match if there's a "b.."
at the beginning of the string, regardless of what you set lastIndex
to (i.e., y
does not make ^
relative to lastIndex
). Moreover, if lastIndex
is greater than 0
, a regex like /^b../y
would never be able to match.
Wow, really good to know.
FWIW, I've completely rewritten (and expanded) my book section on "sticky mode" after all these explorations. If it helps at all, here how I went about it:
https://github.com/getify/You-Dont-Know-JS/blob/master/es6%20&%20beyond/ch2.md#sticky-flag
If there's interest, I'll happily author a significantly reduced summary version of that discussion for this README, and submit a PR. Just let me know.