lolgab/snunit

Forward and back slashes aren't decoded in path

mrdziuban opened this issue · 6 comments

First off, thank you for a great piece of software! While testing it out, I've found that forward (/) and back slashes (\) in a request's URL path aren't decoded from their percent-encoded versions.

To test, I started a simple server that prints the request's path:

SyncServerBuilder.build(new Handler {
  def handleRequest(request: Request): Unit = println(request.path)
}).listen()

Then I ran curl commands with URLs containing all of the reserved characters and the characters listed under "Arbitrary data" in Wikipedia's percent-encoding article. Below is a table with each character, its percent-encoded version, my curl command, and what was printed by the server. Note that / and \ are each still encoded as %2F and %5C respectively, while all other characters are properly decoded.

Character Percent-encoded curl Printed path
space %20 curl http://localhost:8080/test%20 /test 
! %21 curl http://localhost:8080/test%21 /test!
# %23 curl http://localhost:8080/test%23 /test#
$ %24 curl http://localhost:8080/test%24 /test$
% %25 curl http://localhost:8080/test%25 /test%
& %26 curl http://localhost:8080/test%26 /test&
' %27 curl http://localhost:8080/test%27 /test'
( %28 curl http://localhost:8080/test%28 /test(
) %29 curl http://localhost:8080/test%29 /test)
* %2A curl http://localhost:8080/test%2A /test*
+ %2B curl http://localhost:8080/test%2B /test+
, %2C curl http://localhost:8080/test%2C /test,
/ %2F curl http://localhost:8080/test%2F /test%2F
: %3A curl http://localhost:8080/test%3A /test:
; %3B curl http://localhost:8080/test%3B /test;
= %3D curl http://localhost:8080/test%3D /test=
? %3F curl http://localhost:8080/test%3F /test?
@ %40 curl http://localhost:8080/test%40 /test@
[ %5B curl http://localhost:8080/test%5B /test[
] %5D curl http://localhost:8080/test%5D /test]
" %22 curl http://localhost:8080/test%22 /test"
% %25 curl http://localhost:8080/test%25 /test%
- %2D curl http://localhost:8080/test%2D /test-
. %2E curl http://localhost:8080/test%2E /test.
< %3C curl http://localhost:8080/test%3C /test<
> %3E curl http://localhost:8080/test%3E /test>
\ %5C curl http://localhost:8080/test%5C /test%5C
^ %5E curl http://localhost:8080/test%5E /test^
_ %5F curl http://localhost:8080/test%5F /test_
` %60 curl http://localhost:8080/test%60 /test`
{ %7B curl http://localhost:8080/test%7B /test{
| %7C curl http://localhost:8080/test%7C /test|
} %7D curl http://localhost:8080/test%7D /test}
~ %7E curl http://localhost:8080/test%7E /test~
£ C2%A3 curl http://localhost:8080/test%C2%A3 /test£
%E5%86%86 curl http://localhost:8080/test%E5%86%86 /test円

Hi @mrdziuban,
Thank you for trying out SNUnit! You opened the first issue opened by me and I thank you for that!
About the issue, since the decoding is made by NGINX Unit, and for other cases, it works as intended, I think this issue is in NGINX Unit itself. I will check if I can submit a patch or an issue there.

What version of NGINX Unit are you running? I can't reproduce on version 1.27.0 and snunit 0.0.20.
I'm now thinking that using an already parsed path is not that sound, for example how can a library Tapir parse a path if both /s and %2Fs are converted to /.
Fortunately NGINX Unit has another field called target which gives the raw path value. Should this be used instead of path and should the parsing be delegated to something like Tapir (assuming it is able to do it)?

Hey @lolgab, thanks for the quick response! Apologies I didn't fully understand what I was doing (this is my first time trying out scala native in addition to SNUnit) but this happens when calling run in SBT. I realize now that this is running on the JVM and not using NGINX Unit. Like you said, I'm unable to reproduce the issue when I actually use nativeLink and run the binary with NGINX Unit, so maybe this is just an issue with the server used in the JVM version (undertow?).

Regarding providing the already parsed path, I agree it could be difficult to handle in a routing library if characters are already decoded. For what it's worth, http4s retains the encoded path segments on its Uri.Path type and decodes the segments when matching a path using its / matcher

For example, an http4s HttpRoutes value like this

val routes: HttpRoutes[IO] = HttpRoutes.of {
  case req @ (_ -> (Root / pathPart)) =>
    println(s"path segments: ${req.uri.path.segments}")
    println(s"path part: $pathPart")
    Ok("")
}

prints the following when I curl http://localhost:8080/test%2F

path segments: Vector(test%2F)
path part: test/

I think the best solution is to provide both target (unparsed) and path (parsed) for people to choose.
The recommended way to use SNUnit is via its higher level APIs (like the newly added Tapir) which will use the more appropriate depending on what the library expects.
I just found that Tapir handles perfectly encoded slashes in paths if I change req.path to req.target.

About the JVM undertow implementation, it is to allow fast local development but it is not meant to be used in practice, nevertheless I could try to make it outputs something consistent with what NGINX Unit outputs, but I'm not targeting 100% compliance, because Undertow and NGINX Unit are different and filling all the gaps may be hard.

SNUnit 0.0.21 was released with this fix. Thank you for opening the issue with such a thorough explanation.

Thank you for the quick fix!