Possible incompatibility with nginx 1.23.0 and newer (core dumps)
heikojansen opened this issue · 9 comments
I recently tried to update our nginx to v1.23.1 (from 1.21.6). The latest version of the ngx_http_shibboleth module was successfully recompiled against the new nginx sources and the server can be started but crashes as soon as I try to access a protected path.
The nginx config is basically this:
location = /shibbolethauthorizer {
internal;
fastcgi_pass unix:/var/run/shibboleth/shibauthorizer.sock;
}
location /auth/shib {
shib_request /shibbolethauthorizer;
shib_request_use_headers on;
proxy_pass http://app;
}
Accessing /auth/shib
using curl http://host/auth/shib
results in a core dump.
I recompiled nginx with the debug option and debug symbols. Based on the log file I assume that the FCGI subrequest to the shibauthorizer has completed successfully and that the crash happens either when preparing the subrequest reponse to return from "/shibbolethauthorizer" or when analyzing this response in the context of the "/auth/shib" main request:
2022/08/02 12:11:52 [debug] 1667#1667: *2 http wake parent request: "/auth/shib?"
2022/08/02 12:11:52 [debug] 1667#1667: *2 http posted request: "/auth/shib?"
2022/08/02 12:11:52 [debug] 1667#1667: *2 access phase: 10
2022/08/02 12:11:52 [debug] 1667#1667: *2 shib request handler
2022/08/02 12:11:52 [debug] 1667#1667: *2 shib request set variables
2022/08/02 12:11:52 [debug] 1667#1667: *2 shib request authorizer handler
2022/08/02 12:11:52 [debug] 1667#1667: *2 shib request authorizer returning sub-response
2022/08/02 12:11:52 [debug] 1667#1667: *2 shib request authorizer returning header: "Location: https://idp-host/idp/profile/SAML2/Redirect/SSO?SAMLRequest=fZLN...oL&RelayState=ss%3Amem%3Aa6e2b2...5bf"
2022/08/02 12:11:52 [debug] 1667#1667: *2 shib request authorizer returning header: "Expires: Wed, 01 Jan 1997 12:00:00 GMT"
2022/08/02 12:11:52 [debug] 1667#1667: *2 shib request authorizer returning header: "Cache-Control: private,no-store,no-cache,max-age=0"
2022/08/02 12:11:52 [debug] 1667#1667: *2 shib request authorizer returning header: "Set-Cookie: _opensaml_req_ss%3Amem%3Aa6e2b2...5bf=_d6290894d06e282004ab17610a4cbd2c; path=/; secure; HttpOnly; SameSite=None"
2022/08/02 12:11:52 [debug] 1667#1667: *2 http finalize request: 302, "/auth/shib?" a:1, c:1
2022/08/02 12:11:52 [debug] 1667#1667: *2 http special response: 302, "/auth/shib?"
2022/08/02 12:11:52 [debug] 1667#1667: *2 http script copy: "base-uri 'none'; default-src 'self'; img-src 'self' data:; object-src 'none'; script-src 'self' 'nonce-"
2022/08/02 12:11:52 [debug] 1667#1667: *2 http script copy: "'; style-src 'self' 'unsafe-inline'; report-uri /util/csp-report?id="
2022/08/02 12:11:52 [debug] 1667#1667: *2 http script var: "000her2ehit8rmodtq30"
2022/08/02 12:11:52 [debug] 1667#1667: *2 http script copy: "&parent_request_id_hmac="
2022/08/02 12:11:52 [debug] 1667#1667: *2 http script copy: "frame-ancestors "
2022/08/02 12:11:52 [debug] 1667#1667: *2 http script var: "'self'"
2022/08/02 12:11:53 [notice] 1666#1666: signal 17 (SIGCHLD) received from 1667
2022/08/02 12:11:53 [alert] 1666#1666: worker process 1667 exited on signal 11 (core dumped)
The SIGSEGV core dump stack is structurally the same for every test:
#0 ngx_http_destination_charset (name=0x7fffbf02d9c0, r=0x55d4cd642940) at src/http/modules/ngx_http_charset_filter_module.c:330
330 && r->headers_out.override_charset->len)
(gdb) bt
#0 ngx_http_destination_charset (name=0x7fffbf02d9c0, r=0x55d4cd642940) at src/http/modules/ngx_http_charset_filter_module.c:330
#1 ngx_http_charset_header_filter (r=0x55d4cd642940) at src/http/modules/ngx_http_charset_filter_module.c:225
#2 0x000055d4ccad3891 in ngx_http_userid_filter (r=0x55d4cd642940) at src/http/modules/ngx_http_userid_filter_module.c:249
#3 0x000055d4ccad48b3 in ngx_http_headers_filter (r=0x55d4cd642940) at src/http/modules/ngx_http_headers_filter_module.c:243
#4 0x000055d4cca9d945 in ngx_http_send_special_response (r=r@entry=0x55d4cd642940, err=2, clcf=<optimized out>, clcf=<optimized out>) at src/http/ngx_http_special_response.c:727
#5 0x000055d4cca9dd32 in ngx_http_special_response_handler (r=r@entry=0x55d4cd642940, error=error@entry=302) at src/http/ngx_http_special_response.c:523
#6 0x000055d4ccaa1802 in ngx_http_finalize_request (r=0x55d4cd642940, rc=302) at src/http/ngx_http_request.c:2521
#7 0x000055d4cca9af4c in ngx_http_core_access_phase (r=0x55d4cd642940, ph=0x55d4cd64d4a0) at src/http/ngx_http_core_module.c:1150
#8 0x000055d4cca9604d in ngx_http_core_run_phases (r=0x55d4cd642940) at src/http/ngx_http_core_module.c:875
#9 0x000055d4cca9fcfe in ngx_http_run_posted_requests (c=0x55d4cd641930) at src/http/ngx_http_request.c:2424
#10 0x000055d4cca8544f in ngx_epoll_process_events (cycle=0x55d4cd5b13c0, timer=<optimized out>, flags=<optimized out>) at src/event/modules/ngx_epoll_module.c:903
#11 0x000055d4cca796f6 in ngx_process_events_and_timers (cycle=cycle@entry=0x55d4cd5b13c0) at src/event/ngx_event.c:248
#12 0x000055d4cca82d85 in ngx_worker_process_cycle (cycle=cycle@entry=0x55d4cd5b13c0, data=data@entry=0x0) at src/os/unix/ngx_process_cycle.c:721
#13 0x000055d4cca815a1 in ngx_spawn_process (cycle=cycle@entry=0x55d4cd5b13c0, proc=0x55d4cca82d30 <ngx_worker_process_cycle>, data=0x0, name=0x55d4ccb70b8f "worker process", respawn=respawn@entry=0) at src/os/unix/ngx_process.c:199
#14 0x000055d4cca846f7 in ngx_reap_children (cycle=0x55d4cd5b13c0) at src/os/unix/ngx_process_cycle.c:598
#15 ngx_master_process_cycle (cycle=cycle@entry=0x55d4cd5b13c0) at src/os/unix/ngx_process_cycle.c:174
#16 0x000055d4cca56336 in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383
(gdb) p r->headers_out
$1 = {headers = {last = 0x55d4cd642b20, part = {elts = 0x55d4cd642ea8, nelts = 5, next = 0x0}, size = 56, nalloc = 20, pool = 0x55d4cd6428f0}, trailers = {last = 0x55d4cd642b58, part = {elts = 0x55d4cd643308, nelts = 0, next = 0x0}, size = 56, nalloc = 4, pool = 0x55d4cd6428f0}, status = 302, status_line = {len = 0, data = 0x0}, server = 0x0, date = 0x0, content_length = 0x0, content_encoding = 0x0, location = 0x55d4cd642ea8, refresh = 0x0, last_modified = 0x0, content_range = 0x0, accept_ranges = 0x0, www_authenticate = 0x0, expires = 0x55d4cd642ee0, etag = 0x0, cache_control = 0x55d4cd6e4aa8, link = 0x1, override_charset = 0x8, content_type_len = 9, content_type = {len = 9, data = 0x55d4ccb7347b "text/html"}, charset = {len = 0, data = 0x0}, content_type_lowcase = 0x0, content_type_hash = 0, content_length_n = 145, content_offset = 0, date_time = 0, last_modified_time = -1}
(gdb) p *r->headers_out.location
$3 = {hash = 3072802660277, key = {len = 8, data = 0x55bcb074e465 "Location"}, value = {len = 641,
data = 0x55bcb074e46e "https://idp-host/idp/profile/SAML2/Redirect/SSO?SAMLRequest=fZLN..."...}, lowcase_key = 0x55bcb05e0378 "location",
next = 0x75725420736d6574}
(gdb) p r->headers_out.override_charset
$2 = (ngx_str_t *) 0x8
(gdb) p r->headers_out.override_charset->len
Cannot access memory at address 0x8
Cf. https://github.com/nginx/nginx/blob/master/src/http/modules/ngx_http_charset_filter_module.c#L330
So the "Location" header has apparently been parsed successfully but something's wrong with the charset info.
The nginx changelog mentions for v1.23.0:
*) Change in internal API: now header lines are represented as linked
lists.
And since some fiddling with header values takes place in ngx_http_shibboleth_module.c I came to the assumption that this might be the origin of the corrupted data.
Would be great if you could have a look at this.
For now I'll downgrade back to nginx 1.21.6 where things worked fine before.
Heiko
OK, I have forked this repo and made some changes to the code:
master...heikojansen:nginx-http-shibboleth:master
I did not come up with these changes myself but simply copied what I considered relevant from these two commits:
A very quick test after rebuilding nginx (1.23.1) and ngx_http_shibboleth seems to indicate that these changes indeed fix the crash I have reported here!
I currently don't understand why PCRE2 build flags were changed in that second commit and I also need to run a few more tests but at least it looks like the original assumption about the cause of the SIGSEGV was correct and we're headed in the right direction.
Any updates here?
I'm running nginx 1.23.1 with ngx_http_shibboleth built out of your fork and concur that it fixes the segfault issue. If there's no obvious issues or improvements to make, I think it makes sense to pull the fix into the main repo.
Thanks for the confirmation of the fix!
I did not have the time to create a proper pull request, so there's been no progress on my side.
If the maintainers of the project decide to integrate the changes by themselves that would also be fine by me. Otherwise, I hope I'll be able to create a pull request sometime next week.
Thank you heikojansen for sorting this out. To help the maintainers, I have submitted a PR pulling in your changes. Hopefully it is acceptable to all (this is a load-bearing component for some of our stuff, hence my interest in seeing this merged).
Well .... that explains why I just saw this error message when I tried to submit my pull request:
Pull request creation failed. Validation failed: A pull request already exists for heikojansen:nginx_compat_1-23-0.
I was just now preparing a new branch and adding proper commits with comments. Fortunately you at least submitted that pull request only after I added all intended changes.
OK, in the end it only matters that the issue is resolved, so ...
Oops! Sorry for stepping on your toes there. I could withdraw it and turn it over to you if you'd like (I had reviewed it and made sure it worked on our systems =) ).
Beautiful, thanks. I closed the one I opened, sorry for the noise.
Yeah, I actually decided to recreate the branch and submit a pull request of my own so there's now PR #50 to be reviewed and - hopefully - included...
Thanks @heikojansen for the PR - #50 is merged 👍