[Feature] Use library `regex` to offer extended features
Closed this issue · 4 comments
Thoughts on using the regex library under the hood?
This would allow offering some unique/powerful features, including atomic groups and subroutines.
No worries if you think it's not the right fit. But otherwise, this could show up, for example, as:
subroutine(name)
- Or you could use your own name for it like
refSubpattern(name)
. - You could go all out and offer a way to create subroutine definition groups, and then use their patterns by reference.
- Or you could use your own name for it like
atomic(…)
atomic
could also be a boolean option forchoiceOf(…)
.
Additionally, since regex
supports possessive quantifiers (from PCRE, Perl, Java, Ruby, Python, etc.), you could easily offer them. E.g., all the quantifier functions could have their options changed to replace the greedy
property with type
(with values: 'greedy', 'lazy' (or 'nongreedy' if you prefer), and 'possessive'). You could alternatively include a new possessive
boolean option in addition to greedy
, but I wouldn't recommend that since there is no precedent in existing regex flavors for lazy+possessive quantifiers (for good reason, since this would effectively mean to just always use the lower bound of any quantifier).
With the addition of atomic groups and/or possessive quantifiers, you could rightly describe TS Regex Builder as a great way to avoid ReDoS / catastrophic backtracking.
Introducing regex
would also mean being able to improve any regexes within TS Regex Builder's source for readability, etc., and would be particularly beneficial if you start offering a library of common patterns (#73). In source, you'd get the full benefits of the regex
library including free spacing and comments, context-aware interpolation, etc.
Hi @slevithan, sorry for late reply. Could you explain in simple terms the concepts you are proposing? What are the corresponding JS regex patterns for subroutine
and atomic
?
What are the corresponding JS regex patterns
Here's the syntax for these features in the regex flavors that support then (PCRE, Perl, etc., as well as the regex
library which adds support for them to native JS):
- Atomic groups:
(?>...)
. - Possessive quantifiers:
?+
,*+
,++
,{n}+
,{n,}+
,{n,m}+
. - Subroutines:
\g<name>
, where name refers to a named group.
These are powerful features not supported by native JS regexes, except when using the regex
library. I can't directly/fully show what they're transpiled to for JS regexes because these are nontrivial features whose translation depends on context. But you can see results by playing with patterns and seeing transpiled output in regex
's Babel plugin demo REPL.
Could you explain in simple terms the concepts you are proposing?
I'd recommend reading the corresponding sections in the regex
documentation where I explain them with examples. See: atomic groups, possessive quantifiers, subroutines. Atomic groups and possessive quantifiers are primarily used for performance and to avoid runaway backtracking. Subroutines are primarily about reusing subpatterns and building up complex patterns through composition.
I'd be happy to answer any further questions to help clarify!
Ok, so to clarify your idea, it's is to implement these more advanced regex features and integrate them as regex construct functions (e.g. atomic(...)
) or options to existing functions (like zeroOrMore(..., { mode: 'possessive' }
)?
Regarding particular features:
atomic
sounds interesting, the Swift Regex Builder we are modelled after has the same (?) feature underlocal
function for perf optimization- possessive quantifiers also sound interesting, might we useful when hitting perf issues
subroutines
: when it comes to composability, I think we have covered that with being able to re-use pattern fragments in a more readable way (not sure about perf differences here).
Regarding having dependencies on other packages I am against it, as it would significantly increase bundle size. Regex Builder is designed to have minimal (reasonable) bundle size, so it's feasible to use in web apps with minimal perf impact. That's why it's fully tree-shakable, etc. There is a trade-off between having a small bundle size vs having more advanced features. In that dilemma I would rather focus on 80% users using most common 80% features, rather than having most comprehensive regex library out there.
That being said, it would be possible to add these more advanced features in following ways:
- import them (some or all of) directly in TS Regex Builder, so they become tree-shakable and do not impact bundle size when not used
- providing a separate package like
regex/ts-regex-builder
orts-regex-builder-advanced
so that 20% more advanced users could opt-in to these features.
@slevithan wdyt?
regex
is also concerned about bundle size, so it’s reasonably small. But no worries at all if that nevertheless makes it not the right fit, or means that it would have to be relegated to a ts-regex-builder-advanced
.
regex
doesn’t currently offer exports of its internals that can do rewrites for only specific features (although it does offer an options API for controlling which features are applied), because that would impose different tradeoffs. regex
’s extended syntax and implicit flags, due to the complexity of emulating them, work best when they can depend on being composed in the right sequence, share certain data, and rely on not being transpiled in isolation (forward and backward context is needed).
My subjective opinion though is that it’s possible to overly focus on size in a library like this. Many people significantly concerned about bundle size would likely skip ts-regex-builder
entirely or pre-run their regexes through it and copy/paste the output into their code. So it might be more common for this library to be used in Node.js, build steps, and other situations where bundle size is less critical.
the Swift Regex Builder we are modelled after has the same (?) feature under
local
function for perf optimization
Yes, based on the linked docs page, Local
creates an atomic group under the hood.
In any case, feel to close this if you don't think this is something you'll pursue.