Add a "using URLPattern in other APIs" section
domenic opened this issue · 14 comments
We're seeing URLPattern being used in several proposals now:
- Speculation rules
- Service worker static routing
- Hopefully, compression dictionaries
- Potentially in the future, its original motivation, service worker scopes or web app manifest scopes.
I think we need clearer guidance on how to design APIs that accept URL patterns, as we've started to see some slight divergence.
Scenarios to consider:
- JS APIs:
- Should accept
URLPattern
objects - What "shorthands" / "raw inputs" can they accept? Keep in mind that
URLPattern
has multiple constructor forms:new URLPattern({ ...componentsDictionary, baseURL })
,new URLPattern(patternString, baseURL)
, and then variants of each of those with another options argument. - My suggestion:
- Such APIs should have a default base URL, and then accept either:
{ ...componentsDictionary }
(which automatically gets the base URL),{ ...componentsDictionary, baseURL }
(overriding the base URL), orpatternString
(which automatically gets the base URL).
- Downsides of my suggestion:
- The shorthand form can't set the options (i.e.
ignoreCase
) - The shorthand form can't combine a string with an overridden base URL (i.e., there is no counterpart to the
new URLPattern(patternString, baseURL)
constructor). - The shorthand form depends on fixing #179 to give intuitive results for this kind of implicit base URL inheritance
- The shorthand form can't set the options (i.e.
- Should accept
- HTTP headers or other string-accepting APIs: should accept input strings, with some natural base URL automatically applied. (This is what WICG/compression-dictionary-transport#51 proposes.)
- JSON APIs:
- My suggestion (and what we implemented for speculation rules): the same as the "shorthands" sub-question for JS APIs.
- Some of the downsides above become more serious here, because there's no
new URLPattern()
escape hatch; the only possible form is the "shorthand" form.
- Some of the downsides above become more serious here, because there's no
- My suggestion (and what we implemented for speculation rules): the same as the "shorthands" sub-question for JS APIs.
Other considerations: is there any place for pathname-only patterns? IMO probably not because anytime you match on pathname you should also allow matching on search, i.e. discriminating between URL patterns like /products/*
and /product?id=*
is not good. But it's come up a few times.
My proposal is to use the above rules for such situations, and enforce "pathname + search only" by checking that the resulting URL pattern's protocol/username/password/hostname/port are set to match the base URL's origin. Providing a spec helper for such a check would be a good idea, I think.
We're seeing URLPattern being used in several proposals now:
- Speculation rules
- Service worker static routing
- Hopefully, compression dictionaries
- Potentially in the future, its original motivation, service worker scopes or web app manifest scopes.
I think we need clearer guidance on how to design APIs that accept URL patterns, as we've started to see some slight divergence.
Scenarios to consider:
JS APIs:
Should accept
URLPattern
objectsWhat "shorthands" / "raw inputs" can they accept? Keep in mind that
URLPattern
has multiple constructor forms:new URLPattern({ ...componentsDictionary, baseURL })
,new URLPattern(patternString, baseURL)
, and then variants of each of those with another options argument.My suggestion:
- Such APIs should have a default base URL, and then accept either:
{ ...componentsDictionary }
(which automatically gets the base URL),
I think supporting this might give you URLPattern objects "for free" because the components are exposed as properties with the same names, much like how many Foo interfaces satisfy the corresponding FooInit dictionary.
* `{ ...componentsDictionary, baseURL }` (overriding the base URL), or * `patternString` (which automatically gets the base URL).
Downsides of my suggestion:
- The shorthand form can't set the options (i.e.
ignoreCase
)- The shorthand form can't combine a string with an overridden base URL (i.e., there is no counterpart to the
new URLPattern(patternString, baseURL)
constructor).- The shorthand form depends on fixing Base URL inheritance gives unintuitive results #179 to give intuitive results for this kind of implicit base URL inheritance
FWIW, shorthands not having all capabilities seems fine to me.
- HTTP headers or other string-accepting APIs: should accept input strings, with some natural base URL automatically applied. (This is what Switched the match param to use a URLPattern compression-dictionary-transport#51 proposes.)
The natural base URL is almost always the response URL (or request URL, for request headers), right?
JSON APIs:
My suggestion (and what we implemented for speculation rules): the same as the "shorthands" sub-question for JS APIs.
- Some of the downsides above become more serious here, because there's no
new URLPattern()
escape hatch; the only possible form is the "shorthand" form.Other considerations: is there any place for pathname-only patterns? IMO probably not because anytime you match on pathname you should also allow matching on search, i.e. discriminating between URL patterns like
/products/*
and/product?id=*
is not good. But it's come up a few times.My proposal is to use the above rules for such situations, and enforce "pathname + search only" by checking that the resulting URL pattern's protocol/username/password/hostname/port are set to match the base URL's origin. Providing a spec helper for such a check would be a good idea, I think.
+1, though I'm not even sure how often we'll need spec text, as I'd expect in practice the pattern never matching will cause no behavior. So it might just amount to non-normative text advising a warning if the pattern cannot possibly match (because it doesn't match the origin's scheme/host/port, because it doesn't see fragments but the pattern requires a non-empty fragment, etc).
Thank you for raising this.
The shorthand form can't set the options (i.e. ignoreCase)
In WICG/compression-dictionary-transport#51 (comment), the new option to URLPatternOptions
is proposed. So this limitation may be more serious in the future, but some limitations are reasonable as far as the API accepts URLPattern
itself.
My proposal is to use the above rules for such situations, and enforce "pathname + search only" by checking that the resulting URL pattern's protocol/username/password/hostname/port are set to match the base URL's origin.
I understand we'd like to have pathname+search match in the most case, but is this enforcement applied only when the base URL was added automatically? Service worker static routing may accept matching other URLPatternInit items such as hostname, since it handles cross origin resources etc.
One of the other things we'd need when integrating with other specs is an agreed-upon way to specify which tokens are supported. Specifically if regexp
is supported or not since both compression dictionaries and the service worker routing specs have disabled regex support.
Do you have a plan to make URLPattern gets an implicit baseURL?
I expect the urlPattern
field in the static routing API and new URLPattern
behaves the same. I mean that if the short-hand form silently gets the baseURL, the original URLPattern should also silently get the baseURL. To prospect which URL can be matched by the URLPattern, it is natural for the web developers to create URLPattern by themselves, and execute its test
method. It should be confusing if it behaves different from that is specified to the static routing API. At this point, baseURL should also be exposed in the URLPattern interface so that URLPattern object can be used as the static routing API's urlPattern
's input.
Moreover, I expect URLPatternInit and the URLPattern interface has the IgnoreCase field. It is important information for the matching, and should be respected in the evaluator.
A flag to ask URLPattern to prevent regexp might need to be used at the compile time, and it might not need to be passed to the evaluator. I think it nice to make URLPattern API options to support the flag. It is also helpful for the people to understand the behavior with new URLPattern({...}, {prohibit_regexp: true})
.
Do you have a plan to make URLPattern gets an implicit baseURL?
No. And I don't think this is the right question to be asking.
Consider how other web platform APIs behave. fetch("./foo.html")
works, and has an implicit base URL. But new URL("./foo.html")
does not work; there is no "URL
with implicit base URL". You need the explicit new URL("./foo.html", baseURL)
.
This pattern is widespread. Almost every web API (maybe every, now?) comes with an implicit base URL. We don't build the concept of implicit base URL into the URL
primitive itself; that still represents a parsed, absolute, URL. Instead we say that consumers of the URL concept have an implicit base URL, which they apply when you use the "shorthand" form fetch("./foo.html")
instead of the explicit fetch(new URL("./foo.html", document.baseURI))
.
When designing URL patterns and related APIs, how URLs and related APIs work is generally a good precedent.
I expect the
urlPattern
field in the static routing API andnew URLPattern
behaves the same.
I don't think this is the right expectation. new URLPattern
is a constructor, which can take multiple arguments. The urlPattern
field in the static routing API can only accept a single value. We need different APIs for these two places.
Additionally, if you want the full power of the new URLPattern
constructor, you can use it directly: your single value can just be a URLPattern
object, as in urlPattern: new URLPattern(..., ..., ...)
. If you are instead using a shorthand value, e.g. urlPattern: "a string"
, that can behave differently, because strings are a different type than URLPattern
objects.
I don't think this will be confusing for web developers, in the same way that web developers don't find fetch("./foo.html")
being shorthand for fetch(new URL("./foo.html", document.baseURI))
confusing.
Moreover, I expect URLPatternInit and the URLPattern interface has the IgnoreCase field. It is important information for the matching, and should be respected in the evaluator.
If you want ignoreCase
functionality, then you can avoid using the shorthand, and use urlPattern: new URLPattern(..., { ignoreCase })
.
A flag to ask URLPattern to prevent regexp might need to be used at the compile time, and it might not need to be passed to the evaluator.
I think adding this at the URLPattern
API level, especially as a constructor option (URLPatternOptions
), is not the right design. Certain APIs have restrictions on the URLPattern
s they accept: some reject those with regexp parts; some reject (or always fail to match) those that are cross-origin; etc. This processing should be performed by each API.
We should make it easy to specify this prohibition. For example, there should be a specification hook like |urlPattern| [=contains regexp tokens=]
, so that other specifications can write If |urlPattern| [=contains regexp tokens=], then throw a {{TypeError}}
.
If we want to expose this as a web developer-facing API, we could do so: then web developers could emulate the behavior of speculations using if (urlPattern.containsRegExpTokens) { throw new TypeError(); }
. But I am unsure how valuable this is.
I thought that the urlPattern condition in the static routing API would take two kinds of arguments and made it accept URLpatternInput, but it sounds three; USVString, a dictionary for short-hands, and URLPattern object. Upon the past discussion, I misunderstood that you meant passing a dictionary should behave equivalent to passing URLPattern object, but it sounds not. i.e. urlPattern: new URLPattern({...})
can be a different behavior from urlPattern: {...}
even if {...}
are the same.
By the way, I hope #179 is resolved soon. Until that is resolved, I believe it is bad idea to introduce implicit baseURL because it brings unexpected surprises like not matching URLs with query strings, which would commonly happens.
If you want ignoreCase functionality, then you can avoid using the shorthand, and use urlPattern: new URLPattern(..., { ignoreCase }).
How the WebIDL for the urlPattern condition's argument would be?
A flag to ask URLPattern to prevent regexp
My goal is not adding the flag but helping developers to recognize un-intended regular expression usage. Therefore, urlPattern.containsRegExpTokens
still fulfills my goal.
By the way, I hope #179 is resolved soon.
Agreed. I think it is very reasonable to avoid introducing implicit base URL into service worker static routes until we resolve that. I think @jeremyroman will be working on it soon.
How the WebIDL for the urlPattern condition's argument would be?
I think it would be: (URLPattern or URLPatternInit or USVString)
My goal is not adding the flag but helping developers to recognize un-intended regular expression usage. Therefore,
urlPattern.containsRegExpTokens
still fulfills my goal.
This is reasonable. Perhaps we should open a new issue, which you or @sisidovski could send a pull request on, for such a proposal?
How the WebIDL for the urlPattern condition's argument would be?
I think it would be:
(URLPattern or URLPatternInit or USVString)
Thank you.
My goal is not adding the flag but helping developers to recognize un-intended regular expression usage. Therefore,
urlPattern.containsRegExpTokens
still fulfills my goal.This is reasonable. Perhaps we should open a new issue, which you or @sisidovski could send a pull request on, for such a proposal?
I have filed #191
Such a section has been added. Since the original issue comment mentioned HTTP headers which are still in discussion, leaving this issue thread open for now.
I'll note that https://datatracker.ietf.org/doc/draft-ietf-httpbis-compression-dictionary/17/ ends up explicitly instantiating URLPattern classes during HTTP header processing, which is somewhat confusing since, e.g., it doesn't talk about which JavaScript realm to use. It might be good to provide a better interface for HTTP header specs to use soon, if we can...
The existing dfns should be enough to avoid using the WebIDL integration (as the JSON section demonstrates), but still maybe explicitly calling out HTTP headers makes it more straightforward.
Sorry about that. I just published draft-18 of the compression dictionaries ID which switches away from using the WebIDL and better mirrors what the JSON section demonstrates.
FWIW, it might be easier to reference if each of the High-level operations sections had a discrete sub-section. Right now the text version of the ID relies on "create", "match" and "has regexp groups" for each of the sections but it's not necessarily obvious when coming to the spec that that's the part of the document that is referred to (the HTML version has a deep link which makes it easier).
Let me know if you would like any IDs in the section I propose in #230 to be made permanent, too. At the moment it's basically advice and a single algorithm which doesn't do much over the direct create algorithm right now but might gain additional features if required in the future.
Thanks, but I think the direct links to create work better for my use case since it already takes care of asserting the field as a structured field as part of the processing.