Decoding URL Components

There is how decoding URL components is working right now. If URL contains "%" symbol - URL will be decoded by standard js decodeURI function and if URL contains one of this symbols "# $ & + , / : ; = ? @", URL params will be parsed from the original (non-decoded) url and will be decoded either by fastDecode or customDecode function. There is a problem here. If one part of a URL has one of this symbols "# $ & + , / : ; = ? @", all url params will be parsed by fastDecode function and not parsed with decodeURI function.

test('Decode url components', t => {
  t.plan(3)

  const findMyWay = FindMyWay()

  findMyWay.on('GET', '/:param1/:param2', () => {})

  t.same(findMyWay.find('GET', '/foo%23bar/foo%23bar').params, { param1: 'foo#bar', param2: 'foo#bar' })
  t.same(findMyWay.find('GET', '/%F0%9F%8D%8C/%F0%9F%8D%8C').params, { param1: '🍌', param2: '🍌' })
  t.same(findMyWay.find('GET', '/%F0%9F%8D%8C/foo%23bar'), null) // shouldn't be true
  // t.same(findMyWay.find('GET', '/%F0%9F%8D%8C/foo%23bar'), { param1: '🍌', param2: 'foo#bar' }) // should be true
})

I'm pretty sure that the previous test should work how I described it. But here I have a problem. Which of the two last commented test cases should be correct.

test('Decode url components', t => {
  t.plan(4)

  const findMyWay = FindMyWay()

  findMyWay.on('GET', '/:param', () => {})

  t.same(findMyWay.find('GET', '/foo%23bar').params, { param: 'foo#bar' })
  t.same(findMyWay.find('GET', '/%F0%9F%8D%8C').params, { param: '🍌' })
  t.same(findMyWay.find('GET', '/%F0%9F%8D%8C-foo').params, { param: '🍌-foo' })
  t.same(findMyWay.find('GET', '/%F0%9F%8D%8C-foo%23bar').params, { param: '🍌' }) // IMHO, it's wrong
  // t.same(findMyWay.find('GET', '/%F0%9F%8D%8C-foo%23bar').params, { param: '🍌-foo#bar' }) // First option
  // t.same(findMyWay.find('GET', '/%F0%9F%8D%8C-foo%23bar').params, { param: '%F0%9F%8D%8C-foo#bar' }) // Second option
})

Another way of this problem: when people create a custom decoder what url they expect? If we pass to the decoder completely raw URL as we do know, then in one case if it's just '/%F0%9F%8D%8C' they will receive '🍌', but if it's '/%F0%9F%8D%8C' + something that contains "# $ & + , / : ; = ? @", they will receive "/%F0%9F%8D%8C" + something. And they will have to parse not only this symbols "# $ & + , / : ; = ? @", but also banana.

There is a test that I don't understand.

There defined a custom decoder.
There is a dynamic route

find-my-way/test/custom-decode-uri.test.js

Line 56 in 5beaacf

findMyWay.on('GET', '/:param', paramHandler)
We pass "/%23%F0%9F%8D%8C" (encoded /#🍌 as I understand)

find-my-way/test/custom-decode-uri.test.js

Line 66 in 5beaacf

const crazyRoute = findMyWay.find('GET', '/%23%F0%9F%8D%8C')
We expected "/%23%F" as a result

find-my-way/test/custom-decode-uri.test.js

Line 41 in 5beaacf

'%23%F'

The last part is simply ignored.

If URL contains "%" symbol - URL will be decoded by standard js decodeURI function

True

if URL contains one of this symbols "# $ & + , / : ; = ? @"

Almost: the URL must contain those encoded symbols

If one part of a URL has one of this symbols "# $ & + , / : ; = ? @", all url params will be parsed by fastDecode function and not parsed with decodeURI function

Do you think fastDecode has some issues?

t.same(findMyWay.find('GET', '/%F0%9F%8D%8C-foo%23bar').params, { param: '🍌' }) // IMHO, it's wrong

I agree

Which of the two last commented test cases should be correct.

I think the first one: '🍌-foo#bar'

We expected "/%23%F" as a result

I agree that it is an error. There is some issue with the parameter counter when the slice function is applied.

{
  originPath: '/%23%F0%9F%8D%8C',
  decoded: '/%23🍌',
  containsEncodedComponents: true
}

I have a problem here and I need your help with it. I will describe how we decode a URL with params. At first, we decode the whole URL with decodeURI function, in order to decode symbols we need to match the URL to a registered path. Then if we find a param, we need to parse and decode it again from the original path (from the path that we didn't decode with decodeURI). We found the param in the decoded path and should parse it from the non-decoded path, and these paths could mismatch.

Example:

We set up /🍌/:param URL
Trying to match this URL /%F0%9F%8D%8C/param%23 (encoded 🍌 symbol == %F0%9F%8D%8C). We expect that '%F0%9F%8D%8C' would be decoded to the 🍌, then we will find a param and will decode it. Param should be equal 'param#'
We parse the URL with decodeURI function and get /🍌/param%23 URL
We find a param in the decoded URL /🍌/param%23, it starts at index 3
We need to parse and decode param from the original URL /%F0%9F%8D%8C/param%23, but all we know is that param starts at index 3 in the decoded URI.

Any good ideas of how should we do that?

@mcollina @Eomm I need your help here. Maybe you know a good example of another lib or standard which should describe how it should work?

I would recommend looking at path-to-regexp or URLPatter https://wicg.github.io/urlpattern/

@mcollina I tested the path-to-regexp library.

const { match } = require("path-to-regexp")

const matchStatic = match("/🍌", { decode: decodeURIComponent });
console.log(matchStatic("/%F0%9F%8D%8C")) // returns false

console.log(decodeURI('/%F0%9F%8D%8C')) // returns "/🍌"

Is this correct behavior?

URLPattern matches the url, but it doesn't decode the param.

const pattern = new URLPattern('https://host.com/🍌/:id')
console.log(pattern.exec('https://host.com/%F0%9F%8D%8C/123%23')) // returns param id as 123%23, not as 123#

path-to-regexp, URLPattern, and find-my-way have three different ways to deal with encoded URLs.

I don't think there is a preferred way to handle this.

Ok, but the problem is that find-my-way has an issue with that.
#234 (comment)

Unfortunately I do not have enough time right now to design a solution for this. I'd be happy to review a PR.

IMHO we should align to either path-to-regexp or URLPattern. Ideally in a way that minimizes breaking changes.