Subrequest returned 404 for internal locations
kamikat opened this issue · 13 comments
Got following error trying to stream zip content from internal location:
mod_zip: a subrequest returned 404, aborting..., client: <Client-IP>, server: , request: "GET <API-Endpoint-Path> HTTP/1.1", subrequest: "/cache/<Path>", host: "<Hostname>", referrer: "..."
/cache
is configured as internal
location.
location /cache {
internal;
alias ${CACHE_DIR};
}
Remove internal
directive or downgrade mod_zip
to 5b2604b fixes the problem.
@dimaryaz Any ideas on this one?
May be because of this -
Line 523 in 23da149
Internal subrequest becames external and nginx returns 404 on it.
Sorry, just saw this. Will look into this now.
Ok, I think you're right; I can see how sr->internal = 0
would break things. It was a way around decoding/encoding URLs to fix #95, but I didn't fully understand what "internal" means in that context.
Some context for why I made that change in the first place: I'm using mod_zip with Amazon S3, but S3 does not handle URL encoding correctly. According to the RFC, "+" is a valid character in paths, so these two URLs are equivalent:
But, only the first one actually works.
Nginx will decode %2B
to +
, but not encode it back to %2B
- which it has every right to do since they're equivalent. But, that means there is no way to fix both this bug and #95 without either 1) fixing S3 (not going to happen) or 2) changing Nginx to encode "+".
I'll see if Nginx developers would be ok with that; it's a stupid change, but harmless (short of breaking requests to other non-compliant web servers, I guess).
I think the only "correct" solution for mod_zip is to revert most of my change - though the old decoding of "+" was still incorrect.
I'll send out a PR.
But why you need to change subrequest type? May be there is a workaround without type change. We also using S3.
If you expain me your situation in detail and/or provide test configuration, i may debug it in my testing enviroment.
Sure, here's my situation. We have a backend that returns lines like this to mod_zip:
- size /region/bucket/path?signature=... filename
The path
comes from boto3's generate_presigned_url
, and can contain +
s encoded as %2B
.
Nginx is set up to forward /region/bucket/path
to S3 (here's the actual config: https://github.com/quiltdata/quilt/blob/master/s3-proxy/nginx.conf#L26 )
The original mod_zip code marked the proxied request to /region/bucket/path
as "internal", causing Nginx to encode it using ngx_escape_uri
: https://github.com/nginx/nginx/blob/master/src/http/modules/ngx_http_proxy_module.c#L1191-L1196 So to avoid double-encoding, mod_zip would first decode the URL before giving it to Nginx - but, the decoding/encoding was causing the %2B
->+
->+
conversion. There's no way to avoid that if
statement besides setting r->internal
to 0, so that's why I did it.
Hopefully this makes sense. I'll try to come up with a simple test case.
@dimaryaz you get a encoded path in this (https://github.com/quiltdata/quilt/blob/master/s3-proxy/nginx.conf#L26) location, or it is beeing encoded inside proxy_pass?
It gets decoded to the point that it's impossible for us to use $s3_path
later, and I had to do another if ($request_uri ~ ...)
just to use $1
: https://github.com/quiltdata/quilt/blob/master/s3-proxy/nginx.conf#L69-L73
It looks like a bug or at least a bad design in Nginx.
Using if is evil in Nginx because in most case it doesn't inherit some environment inside it - https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/
But you may use lua to encode url for S3 in a proper way.
As for plus sign in S3 - this is architectural "bug", it is not fully compatible with the RFC and, in my opinion, Nginx developers won't change that behaviour.
Oh, yeah, I know all about the evils of "if"; I'll see if I can figure out how to use lua.
But, I finally have a simple test case, without "if"s, regexes, and so on.
Here's an nginx config (GitHub doesn't allow attaching .conf extensions - are you kidding me?):
events {
}
http {
server {
listen 3000 default_server;
location / {
proxy_pass 'http://127.0.0.1:3001';
}
}
server {
listen 3001 default_server;
location / {
proxy_pass 'http://dimaryaz.s3.amazonaws.com';
}
location /modzip/zip/ {
add_header X-Archive-Files zip;
proxy_pass 'http://dimaryaz.s3.amazonaws.com';
}
}
}
I'm using two server
s in order to add the X-Archive-Files
header - not sure if there's a simpler way.
I have the following S3 files:
- https://dimaryaz.s3.amazonaws.com/modzip/simple.txt - a normal file without weird characters
- https://dimaryaz.s3.amazonaws.com/modzip/a%2Bb.txt - a file with a
+
that triggers the decoding/encoding bug - https://dimaryaz.s3.amazonaws.com/modzip/zip/simple.txt - a mod_zip spec for a Zip containing the simple file
- https://dimaryaz.s3.amazonaws.com/modzip/zip/bug.txt - a mod_zip spec for a Zip containing the file with the
+
Now, the simple zip file will always work:
$ curl http://localhost:3000/modzip/zip/simple.txt
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.
The bug zip will not work without #96:
$ curl http://localhost:3000/modzip/zip/bug.txt
curl: (52) Empty reply from server
And the following shows up in the Nginx log:
127.0.0.1 - - [10/Feb/2023:01:02:20 -0800] "GET /modzip/zip/bug.txt HTTP/1.0" 200 31 "-" "curl/7.85.0"
127.0.0.1 - - [10/Feb/2023:01:02:20 -0800] "GET /modzip/a+b.txt HTTP/1.0" 403 243 "-" "-"
2023/02/10 01:02:20 [error] 78750#0: *1 mod_zip: a subrequest returned 403, aborting... while reading response header from upstream, client: 127.0.0.1, server: , request: "GET /modzip/zip/bug.txt HTTP/1.1", subrequest: "/modzip/a+b.txt", upstream: "http://127.0.0.1:3001/modzip/a+b.txt", host: "localhost:3000"
127.0.0.1 - - [10/Feb/2023:01:02:20 -0800] "GET /modzip/zip/bug.txt HTTP/1.1" 200 0 "-" "curl/7.85.0"
Hopefully you can reproduce this now. If there's any workaround that works without #96, I'm happy to use it. (Will most likely still require removing this, though.)
I have investigated and found two solutions:
- If we keep #96 so url is not unescaped by mod_zip, you may use $request_uri in proxy_pass directive:
server {
listen 3000 default_server;
location / {
proxy_pass 'http://127.0.0.1:3001';
}
location /modzip {
proxy_pass 'http://127.0.0.1:3001$request_uri';
}
}
server {
listen 3001 default_server;
location / {
proxy_pass 'http://dimaryaz.s3.amazonaws.com';
}
location /modzip {
proxy_pass 'http://dimaryaz.s3.amazonaws.com$request_uri';
}
location /modzip/zip/ {
add_header X-Archive-Files zip;
proxy_pass 'http://dimaryaz.s3.amazonaws.com';
}
}
request_uri contains unnormalized original uri from mod_zip module and it would not be escaped again by nginx
- We discard #96 except "space part". This it more complicated, you need to deal with "+" by yourself in lua. For examle like this:
location /modzip/ {
set_by_lua_block $s3uri {
local s3_uri, n, err = ngx.re.gsub(ngx.var.uri, "\\+", "%2B")
if not s3_uri then
ngx.log(ngx.ERR, "error: ", err)
return
end
return s3_uri
}
proxy_pass 'http://dimaryaz.s3.amazonaws.com$s3uri';
}