URL path shortening for ../ creates problem with other URL parsers that do not follow the whatwg standard
stefanbeigel opened this issue · 6 comments
What is the issue with the URL Standard?
Hi,
I would like to share with you a common scenario:
- A request is recevied via NodeJs Express or Fastify server
- Request is forwarded to another service using an http client that uses the URL class to build the target URL using the service hostname + the incoming request.pathname
This scenario can lead to path traversal vulnerabilities as Express and Fastify do not evaluate ../
and ./
but the whatwg URL does. So the route checks of express / fastify match another path. This situation is not good at all, because the developer need to know about the different parsing / evaluation logic.
Example
I have prepared a sample application with fastify.
https://github.com/stefanbeigel/whatwg-fastify-path-traversal/blob/main/index.mjs
Call the app with curl --path-as-is localhost:3000/abc/../foobar
Possible solutions
- Http server libraries parses the URL with the whatwg URL standard
- Whatwg URL drops the path shortening or gives an option to disable it
As this behavior was introduced by the URL class I created this issue, even you can argue that this is a problem of fastify / express / nodejs.
Using different parsers is a known security vulnerability and it's one of the reasons we have standardized how URLs are parsed, so more tooling can interoperate with it.
I hope you're not expecting that we change the standard over this as that would undoubtedly break the web.
Using different parsers is a known security vulnerability and it's one of the reasons we have standardized how URLs are parsed, so more tooling can interoperate with it.
Are there any plans to also adapt server side parsing of the url in nodejs / the http server module to use the whatwg URL class?
I know it might be better to ask this the nodejs team but I guess you are also in contact with them.
I hope you're not expecting that we change the standard over this as that would undoubtedly break the web.
Well actually I would question why it's defined in the standard like this, why not leave the path as it is?
With this the whatwg url standard automatically creates issues with other url parsers that do not evaluate ../
BR
Best to ask Node.js.
It's defined in the standard like this because this is how browsers have behaved for a long time. They parse and mostly-canonicalize at the same time.
Hi
It's defined in the standard like this because this is how browsers have behaved for a long time. They parse and mostly-canonicalize at the same time.
Well then why even follow the whatwg standard if it's mainly focused on browsers?
Really wonder why this URL class has been made standard in NodeJs, will open a issue at NodeJs and link it here.
Btw I also opened an issue at fastify.
fastify/fastify#5204
Because having different parsers causes issues, I thought we already established that...
By the way, the treatment of .
and ..
components in this standard is consistent with previous RFCs, such as RFC-3986.
When describing URL normalization, RFC-3986 is explicit about what .
and ..
components are for:
6.2.2.3. Path Segment Normalization
The complete path segments "." and ".." are intended only for use
within relative references (Section 4.1) and are removed as part of
the reference resolution process (Section 5.2). However, some
deployed implementations incorrectly assume that reference resolution
is not necessary when the reference is already a URI and thus fail to
remove dot-segments when they occur in non-relative paths. URI
normalizers should remove dot-segments by applying the
remove_dot_segments algorithm to the path, as described in
Section 5.2.4.
It's been a while since I read the HTTP spec, but if I recall correctly, I believe the idea is that the server takes the path and query (together known as the "request target"), and the host (supplied via the Host:
header), and reconstructs the URL using the processing in RFC-3986. That means it would be expected to remove dot segments, as this standard also does.
Furthermore, I think RFC-3986 is clear that .
and ..
components are intended only for hierarchical traversal within the path, and any servers which give them a different interpretation could reasonably be described as "incorrect". It would be very difficult for clients to robustly interact with such a server because the RFC gives tools and libraries explicit license and even encouragement to remove them ("URI normalizers should remove dot-segments").
So IMO, that part of the standard, and its embodiment in JavaScript's URL
class, is consistent with RFC-3986. Which is nearly 20 years old and widely used even outside of browser contexts.