whatwg/urlpattern

Optional wildcard groups inconsistent with path-to-regexp behavior

HansBrende opened this issue ยท 12 comments

const pattern = '/products/:remaining_path(.*)?'

function test(path) {
   console.log(`Matching ${path} with pattern ${pattern}
   path-to-regexp says remaining_path=${pathToRegexp(pattern).exec(path)[1]}
   URLPattern says remaining_path=${new URLPattern({pathname: pattern}).exec({pathname: path}).pathname.groups.remaining_path}`);
}

test('/products/') prints:

Matching /products/ with pattern /products/:remaining_path(.*)?
   path-to-regexp says remaining_path=
   URLPattern says remaining_path=

So far so good: both agree the result is empty string...

But now... test('/products') prints:

Matching /products with pattern /products/:remaining_path(.*)?
   path-to-regexp says remaining_path=undefined
   URLPattern says remaining_path=

Is this expected behavior? Path-to-regexp's behavior here is preferable IMO... if the path does not end with a slash, one cannot rightly say that remaining_path had an empty string match, unless you assume that a url both without and with a trailing slash are exactly synonymous, which is not necessarily the case.

(The same suboptimal results occur for URLPattern when using the more concise pattern /products/*?)

I agree this behavior does not seem correct. Let me look into it.

I think this is coming from default regexp behavior. This:

/(.*)/.exec('')[1]

Evaluates to the empty string. I'll have to see what path-to-regexp is doing to get it to be undefined.

@wanderview but that's not the pattern it should evaluate to, right?

It should be:

/(?:\/(.*))?/.exec('')[1] // prints undefined

Right, I see the regexp is different and urlpattern is using the same regexp. I'm still trying to figure out where the empty string is coming in.

At least part of the problem here is urlpattern uses record<USVString, USVString> for the groups webidl which does not permit an undefined value. This needs to change to record<USVString, (USVString or undefined)>.

@wanderview your comment just made me realize that the scope of this bug is actually way larger than I thought! It doesn't just apply to optional wildcards but also optional variables:

new URLPattern({pathname: '/a/:id?/b'}).exec({pathname: '/a/b'}).pathname.groups

prints:
{id: ''}

Yea, this is a good catch. Thanks for filing the issue! This will change a significant number of test results I think, but I think its worth it.

The other half of this is that chromium converts the null WTF::String internal type to empty string in its webidl bindings. This means the spec is also not right because it does not do this auto-conversion, but also has webidl that says undefined should not be returned right now.

So I need to fix the implementation and the spec.

FYI @lucacasonato. When this lands it will likely be a large number of tests changed.

@wanderview Instead of making it record<USVString, (USVString or undefined)> just keep it as record<USVString, USVString> and not add the keys you want to be undefined. That way "key" in res.pathname.result also works correctly

But generally in favor ๐Ÿ‘

@domenic mentioned that possibility, but we want to add the record with an undefined value to match what RegExp currently does.

The spec and polyfill have been fixed. The implementation fix and WPT tests are still in review but should land shortly.