Semicolons are erroneously encoded in query params
Opened this issue · 4 comments
Hey,
I've had a user report the following normalization:
normalize('https://my.otrs.dom/index.pl?Action=AgentTicketZoom;TicketID=707128') == 'https://my.otrs.dom/index.pl?Action=AgentTicketZoom%3BTicketID%3D707128'
...which according to the user didn't preserve the semantics of the URL.
Checking the RFC, it appears that ;
and =
are part of the sub-delims non-terminal which defines a section of reserved characters that should not be encoded.
Am I missing something?
It's just URL encoded. It doesn't change any semantics of the URL:
const a = 'https://my.otrs.dom/index.pl?Action=AgentTicketZoom;TicketID=707128';
const b = 'https://my.otrs.dom/index.pl?Action=AgentTicketZoom%3BTicketID%3D707128';
new URL(a).searchParams.get('Action') === new URL(b).searchParams.get('Action')
//=> true
Mh. I assume this is because the URL implementation simply treats ;
as data, which is fine, but it's not canonical.
The above-mentioned RFC says:
reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
The purpose of reserved characters is to provide a set of delimiting
characters that are distinguishable from other data within a URI.
URIs that differ in the replacement of a reserved character with its
corresponding percent-encoded octet are not equivalent. Percent-
encoding a reserved character, or decoding a percent-encoded octet
that corresponds to a reserved character, will change how the URI is
interpreted by most applications.
Incidentally,
const a = 'https://my.otrs.dom/index.pl?Action=AgentTicketZoom;TicketID=707128';
const b = 'https://my.otrs.dom/index.pl?Action=AgentTicketZoom%3BTicketID%3D707128';
new URL(a).search === new URL(b).search
//=> false