Decoding URL Components
ivan-tymoshenko opened this issue ยท 13 comments
There is how decoding URL components is working right now. If URL contains "%" symbol - URL will be decoded by standard js decodeURI function and if URL contains one of this symbols "# $ & + , / : ; = ? @", URL params will be parsed from the original (non-decoded) url and will be decoded either by fastDecode or customDecode function. There is a problem here. If one part of a URL has one of this symbols "# $ & + , / : ; = ? @", all url params will be parsed by fastDecode function and not parsed with decodeURI function.
test('Decode url components', t => {
t.plan(3)
const findMyWay = FindMyWay()
findMyWay.on('GET', '/:param1/:param2', () => {})
t.same(findMyWay.find('GET', '/foo%23bar/foo%23bar').params, { param1: 'foo#bar', param2: 'foo#bar' })
t.same(findMyWay.find('GET', '/%F0%9F%8D%8C/%F0%9F%8D%8C').params, { param1: '๐', param2: '๐' })
t.same(findMyWay.find('GET', '/%F0%9F%8D%8C/foo%23bar'), null) // shouldn't be true
// t.same(findMyWay.find('GET', '/%F0%9F%8D%8C/foo%23bar'), { param1: '๐', param2: 'foo#bar' }) // should be true
})
I'm pretty sure that the previous test should work how I described it. But here I have a problem. Which of the two last commented test cases should be correct.
test('Decode url components', t => {
t.plan(4)
const findMyWay = FindMyWay()
findMyWay.on('GET', '/:param', () => {})
t.same(findMyWay.find('GET', '/foo%23bar').params, { param: 'foo#bar' })
t.same(findMyWay.find('GET', '/%F0%9F%8D%8C').params, { param: '๐' })
t.same(findMyWay.find('GET', '/%F0%9F%8D%8C-foo').params, { param: '๐-foo' })
t.same(findMyWay.find('GET', '/%F0%9F%8D%8C-foo%23bar').params, { param: '๐' }) // IMHO, it's wrong
// t.same(findMyWay.find('GET', '/%F0%9F%8D%8C-foo%23bar').params, { param: '๐-foo#bar' }) // First option
// t.same(findMyWay.find('GET', '/%F0%9F%8D%8C-foo%23bar').params, { param: '%F0%9F%8D%8C-foo#bar' }) // Second option
})
Another way of this problem: when people create a custom decoder what url they expect? If we pass to the decoder completely raw URL as we do know, then in one case if it's just '/%F0%9F%8D%8C' they will receive '๐', but if it's '/%F0%9F%8D%8C' + something that contains "# $ & + , / : ; = ? @", they will receive "/%F0%9F%8D%8C" + something. And they will have to parse not only this symbols "# $ & + , / : ; = ? @", but also banana.
There is a test that I don't understand.
- There defined a custom decoder.
- There is a dynamic route
- We pass "/%23%F0%9F%8D%8C" (encoded /#๐ as I understand)
- We expected "/%23%F" as a result
The last part is simply ignored.
If URL contains "%" symbol - URL will be decoded by standard js decodeURI function
True
if URL contains one of this symbols "# $ & + , / : ; = ? @"
Almost: the URL must contain those encoded symbols
If one part of a URL has one of this symbols "# $ & + , / : ; = ? @", all url params will be parsed by fastDecode function and not parsed with decodeURI function
Do you think fastDecode has some issues?
t.same(findMyWay.find('GET', '/%F0%9F%8D%8C-foo%23bar').params, { param: '๐' }) // IMHO, it's wrong
I agree
Which of the two last commented test cases should be correct.
I think the first one: '๐-foo#bar'
We expected "/%23%F" as a result
I agree that it is an error. There is some issue with the parameter counter when the slice function is applied.
{
originPath: '/%23%F0%9F%8D%8C',
decoded: '/%23๐',
containsEncodedComponents: true
}
I have a problem here and I need your help with it. I will describe how we decode a URL with params. At first, we decode the whole URL with decodeURI function, in order to decode symbols we need to match the URL to a registered path. Then if we find a param, we need to parse and decode it again from the original path (from the path that we didn't decode with decodeURI). We found the param in the decoded path and should parse it from the non-decoded path, and these paths could mismatch.
Example:
- We set up
/๐/:param
URL - Trying to match this URL
/%F0%9F%8D%8C/param%23
(encoded ๐ symbol == %F0%9F%8D%8C). We expect that '%F0%9F%8D%8C' would be decoded to the ๐, then we will find a param and will decode it. Param should be equal 'param#' - We parse the URL with decodeURI function and get
/๐/param%23
URL - We find a param in the decoded URL
/๐/param%23
, it starts at index 3 - We need to parse and decode param from the original URL
/%F0%9F%8D%8C/param%23
, but all we know is that param starts at index 3 in the decoded URI.
Any good ideas of how should we do that?
I would recommend looking at path-to-regexp or URLPatter https://wicg.github.io/urlpattern/
@mcollina I tested the path-to-regexp library.
const { match } = require("path-to-regexp")
const matchStatic = match("/๐", { decode: decodeURIComponent });
console.log(matchStatic("/%F0%9F%8D%8C")) // returns false
console.log(decodeURI('/%F0%9F%8D%8C')) // returns "/๐"
Is this correct behavior?
URLPattern matches the url, but it doesn't decode the param.
const pattern = new URLPattern('https://host.com/๐/:id')
console.log(pattern.exec('https://host.com/%F0%9F%8D%8C/123%23')) // returns param id as 123%23, not as 123#
path-to-regexp, URLPattern, and find-my-way have three different ways to deal with encoded URLs.
I don't think there is a preferred way to handle this.
Ok, but the problem is that find-my-way has an issue with that.
#234 (comment)
Unfortunately I do not have enough time right now to design a solution for this. I'd be happy to review a PR.
IMHO we should align to either path-to-regexp or URLPattern. Ideally in a way that minimizes breaking changes.