Handling of special character in hostname
rubycon opened this issue · 0 comments
What is the issue with the URL Pattern Standard?
According to the Web Platform Test these hostnames should throw a TypeError :
bad/hostname
bad#hostname
bad%hostname
bad\:hostname
bad\nhostname
bad\rhostname
bad\thostname
However the validation of hostname rely almost entirely on URL spec's internal basic parser and according to the spec these cases don't throw a TypeError.
After they're passed to the constructor, they go though the initialize steps, are passed to process a URLPatternInit but not validated because they're patterns. Then they're passed to compile a component with the canonicalize a hostname callback and finally to the basic URL parser with an empty URL Record and state override to hostname state
.
-
bad\nhostname
,bad\rhostname
,bad\thostname
: The basic URL parser strip all tabs and newline before processing the input2. If input contains any ASCII tab or newline, invalid-URL-unit validation error.
3. Remove all ASCII tab or newline from input.
So these 3 strings will be treated as
badhostname
and no error will be thrown. However a non failinginvalid-URL-unit
validation error will occur. This behaviour is consistent with the external URL API (e.g.new URL("http://bad\nhostname")
is OK). -
bad/hostname
andbad#hostname
: The URL parser will stop processing the input after the special character and return onlybad
which is safely validated.3. Otherwise, if one of the following is true:
- c is the EOF code point, U+002F (/), U+003F (?), or U+0023 (#)
- url is special and c is U+005C (\)
bad?hostname
fails in the pattern parser which expect the?
modifier to be the last character. -
bad\:hostname
: The:
char is escaped in the pattern parser andbad:hostname
is passed to the URL parser. When the parser encounter the:
char with ahostname state
state override it returns without processing any hostname.2. Otherwise, if c is U+003A (:) and insideBrackets is false, then:
2. If state override is given and state override is hostname state, then return.
After returning the hostname is
null
and the code later fail on an assertion when running generate a regular expression and name list.
This case looks more like an URL spec issue, it is not consistent with the handling of the/
,?
and#
delimiters. -
bad%hostname
: The hostname is fully parsed by the URL parser and passed to the host parser as an opaque URL. The%
is allow in opaque url but only for percent encoded values, so a non failinginvalid-URL-unit
validation error occur.