Is it worth adding a host-relative `resolve`?
karwa opened this issue · 2 comments
The resolve
function is incredibly convenient.
let endpoint = WebURL("https://api.example.com/")!
let user_root = endpoint.resolve("/users/karl/")! // https://api.example.com/users/karl/
getJSON(url: user_root.resolve("profile_data")) // https://api.example.com/users/karl/profile_data
getJSON(url: user_root.resolve("subscription_info")) // https://api.example.com/users/karl/subscription_info
However, it accepts quite a wide variety of input formats - far too many to really expect users to remember, meaning that it is quite easy to construct surprising URLs if the joined strings are constructed from unsanitized input. Moreover, sanitizing inputs to the parser is very, very hard. For one thing, the parser will trim ASCII control characters and spaces, and filter newline and tab characters. So the input \n \n\t/some/path
is parsed exactly the same as /some/path
, even though the first character technically isn't "/". How many users have if string.first == "/"
or string.hasPrefix("/")
checks that don't take filtering in to account?
In particular, I think that joined strings which have the potential to change the scheme and/or host are important to guard against. A change of scheme/host means a change of origin, and while web browsers have the notion of a 'current origin' (of the 'current page'), applications don't have any notion of what their 'current origin' is, and thus no protections against cross-origin communication. Even if an incorrectly-sanitized string ends up with a slightly odd-looking URL, I think that's a much more reasonable consequence than sending the request to an entirely different server (possibly under the user's control, with the possibility to feed the application fake data and exploit holes in the app's logic).
There are 2 formats that can change the host:
-
Absolute URLs
The
resolve
function actually accepts an entirely different URL string - so if a user can control the start of the input string, they effectively have complete control to override where the request will go. This can be an unexpected result
when resolving a relative path whose initial component contains a ":", such as a Windows drive letter:let base = WebURL("file:///C:/Windows/")! base.resolve("D:/Media") // { "d:/Media", scheme = "D", path = "/Media" } base.resolve(hostRelative: "D:/Media") // { "file:///D:/Media", scheme = "file", path = "D:/Media" }
It may also occur if the joined string begins with unsantized input:
let endpoint = WebURL("https://api.example.com/")! // Using 'resolve': func getImportantDataURL(user: String) -> WebURL { endpoint.resolve("\(user)/files/importantData")! } getImportantDataURL(user: "frank") // "https://api.example.com/frank/files/importantData" getImportantDataURL(user: "http://fake.com") // "http://fake.com/files/importantData" // Using 'join(hostRelative:)': func getImportantDataURL(user: String) -> WebURL { endpoint.join(hostRelative: "\(user)/files/importantData")! } getImportantDataURL(user: "frank") // "https://api.example.com/frank/files/importantData" getImportantDataURL(user: "http://fake.com") // "http://api.example.com/http://fake.com/files/importantData"
-
Protocol-relative URLs.
Strings with a leading "//" are known as protocol-relative URLs. They were useful for web pages that needed
to support both HTTP and HTTPS, but tend to be discouraged these days. Nonetheless, support for these URLs
means that certain code which attempts to make a path absolute by prepending a "/" can inadvertently change
the host:let container = WebURL("foo://somehost/")! // Using 'resolve': func getURLForPath(path: String) -> WebURL { // Did the caller add a leading '/'? Probably not - let's add one! container.resolve("/\(path)")! } // Will the function add a leading '/'? Probably not - let's add one! getURLForPath(path: "/users/john/profile") // foo://users/john/profile // Using 'resolve(hostRelative:)': func getURLForPath(path: String) -> WebURL { // Did the caller add a leading '/'? Probably not - let's add one! container.resolve(hostRelative: "/\(path)")! } // Will the function add a leading '/'? Probably not - let's add one! getURLForPath(path: "/users/john/profile") // foo://somehost//users/john/profile
How much of a problem is this in the real world? I don't know, but we've all written code to check if a string starts with a "/" to try and predict how the parser will interpret it. Admittedly, the examples above are a bit contrived. It's possible that I just suck at thinking of realistic examples.
Prototype is now on branch relatively-joined
:
This function used to be called join
. It was renamed to resolve
based on feedback on other URL libraries, such as servo/rust-url#333