Unix domain socket not cleaned up if the router process is not terminated cleanly
Opened this issue · 4 comments
Hello,
We're running into an issue with nginx-unit, which is mostly caused by OOM-killer. Unit is running in a Docker container, and we have fairly strict memory and CPU constraints configured for it, which we don't want to remove. If a process in the container tries to allocate more memory than cgroup limits allow, OOM killer steps in and sends a SIGKILL signal to a (possibly random, haven't confirmed) process in the container/cgroup. If it kills the "router" process, then unit is unable to recover from that, returning the bind(\"unix:/tmp/app-listener.unit.sock\") failed (98: Address already in use)
error when it starts up again (previously discussed in #669 and a few other issues).
It'd be great if unit was able to recover gracefully from failures like this. We're currently testing the following patch which removes the socket if it already exists, before binding to it. This does work but not sure if it's a good idea:
diff --git a/src/nxt_main_process.c b/src/nxt_main_process.c
index 060ead41..a59d5703 100644
--- a/src/nxt_main_process.c
+++ b/src/nxt_main_process.c
@@ -1184,6 +1184,16 @@ nxt_main_listening_socket(nxt_sockaddr_t *sa, nxt_listening_socket_t *ls)
const socklen_t length = sizeof(int);
static const int enable = 1;
+ if (sa != NULL && sa->u.sockaddr.sa_family == AF_UNIX && sa->u.sockaddr_un.sun_path[0] != '\0') {
+ char *filename;
+ filename = sa->u.sockaddr_un.sun_path;
+
+ struct stat buffer;
+ if (stat(filename, &buffer) == 0) {
+ unlink(filename);
+ }
+ }
+
s = socket(sa->u.sockaddr.sa_family, sa->type, 0);
if (nxt_slow_path(s == -1)) {
Reproduction steps/example (it's also reproducible on 1.33.0):
# docker top app
UID PID PPID C STIME TTY TIME CMD
root 90925 90904 0 13:25 ? 00:00:00 unit: main v1.32.1 [/usr/sbin/unitd --no-daemon --control unix:/nginx-unit/control.unit.sock]
systemd+ 90981 90925 0 13:25 ? 00:00:00 unit: controller
systemd+ 90982 90925 0 13:25 ? 00:00:00 unit: router
1000009+ 91380 90925 0 13:26 ? 00:00:00 unit: "app-test-app" prototype
1000009+ 91381 91380 31 13:26 ? 00:00:00 unit: "app-test-app" application
# kill -9 90982
# docker top app
UID PID PPID C STIME TTY TIME CMD
root 90925 90904 0 13:25 ? 00:00:00 unit: main v1.32.1 [/usr/sbin/unitd --no-daemon --control unix:/nginx-unit/control.unit.sock]
systemd+ 90981 90925 0 13:25 ? 00:00:00 unit: controller
systemd+ 91397 90925 0 13:26 ? 00:00:00 unit: router
# docker logs app 2>&1 | grep alert
2024/10/01 13:26:33 [alert] 1#1 process 36 exited on signal 9
2024/10/01 13:26:33 [alert] 1#1 sendmsg(10, -1, -1, 1) failed (32: Broken pipe)
2024/10/01 13:26:33 [alert] 1#1 bind(\"unix:/tmp/app-listener.unit.sock\") failed (98: Address already in use)
2024/10/01 13:26:33 [alert] 43#43 failed to apply new conf
2024/10/01 13:26:33 [alert] 35#35 failed to apply previous configuration
I'm wondering if there's a better workaround for this issue and/or if this is a bug that you're open to addressing in the future?
Hi, yes, this is something that has come up before.
IIRC we have generally discounted simply trying to remove the socket at startup, for $reasons.
However, here are some other things you may consider
- OK, so this may seem pretty obvious and there may be a reason you don't/can't do this, but you could simply remove the socket before running Unit...
If you are restarting the docker container when this happens then if you create the UNIX domain socket under a tmpfs(5) file-system (like /tmp
or /run
on Linux) then it will be automatically removed.
- Reducing the likelihood/preventing the router process from being OOM-killed.
For example to make the router process exempt from OOM-kills
# echo -1000 >/proc/$(pidof "unit: router")/oom_score_adj
Maybe also include the main
and controller
processes, though depending on your use case, the router process may indeed grow quite large...
See here for more details.
- Unit supports abstract UNIX domain sockets (Linux only).
These are like virtual socket files and will be automatically cleaned up.
Specify it like unix:@abstract_socket
(note the @
symbol...)
This may be an option if your client supports connecting to such a thing, e.g with the --abstract-unix-socket
option in curl(1).
See the unix(7) man-page for more details.
Thanks for the suggestions! We're considering using abstract sockets but they're unfortunately not supported in Nginx, or at least not well, so it looks like we would have to patch support for those into it ourselves as well which is not ideal. I noticed there's a few patches in mailing lists and Openresty issues floating around...
Regarding restarting the container/removing the socket on startup to fix the issue - the app's environment is actually rebuilt entirely if the container's restarted so the socket does get cleaned up in that case. However, when the described issues occurs, Unit ends up in a bad state but continues running. We would actually prefer if it completely exited as that would then let the app recover automatically.
I'm also wondering what contributes to the high memory usage of the router - does it increase with the number of open/pending connections or are there other factors (our config is relatively simple)? And do you know if the backlog
option, that was added in the previous release, would help with managing its memory usage?
Regarding restarting the container/removing the socket on startup to fix the issue - the app's environment is actually rebuilt entirely if the container's restarted so the socket does get cleaned up in that case. However, when the described issues occurs, Unit ends up in a bad state but continues running. We would actually prefer if it completely exited as that would then let the app recover automatically.
It would probably help to know what is actually causing the OOM situation.
If it's something unrelated to Unit, then you could simply exempt Unit (the whole thing, all the unit processes) from being OOM-killed.
If it's an application running under Unit, then increase its likelihood of being OOM-killed. If it is, Unit will re-spawn it.
OK, ignore the below, it was using a regular TCP socket.
Hmm, if you kill the router process then it's also re-spawned (along with applications)
So now I'm wondering how exactly you're getting into the situation you are... maybe it's nothing to do with Unit and the OOM condition is still happening...
I'm also wondering what contributes to the high memory usage of the router - does it increase with the number of open/pending connections or are there other factors (our config is relatively simple)? And do you know if the backlog option, that was added in the previous release, would help with managing its memory usage?
Unit may buffer request data. You'll see this when running it as a proxy. If you try and load a large file into an application, it may get buffered, exact behaviour and where the buffering occurs will be somewhat dependent on language module being used (and also the application itself).
I'll paste one example of OOM killer's output when it killed the router process below. OOM killer was invoked because we have a memory limit (3GB) set on the Docker container that unit is running in. There's nothing else running in the container and OOM killer wasn't invoked due to high memory pressure on the host. It killed the process with PID 889451 (unitd router) which was using approximately 2GB of memory when it was killed. The worker processes (PHP application) were using very little memory in comparison, so it's unlikely that those caused the issue. I'm assuming that the router process simply started using more memory because of a traffic spike that occurred around the same time.
[Mon Sep 30 17:57:45 2024] Tasks state (memory values in pages):
[Mon Sep 30 17:57:45 2024] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[Mon Sep 30 17:57:45 2024] [ 889388] 0 889388 4286 1152 61440 0 0 unitd
[Mon Sep 30 17:57:45 2024] [ 889450] 999 889450 2676 447 57344 32 0 unitd
[Mon Sep 30 17:57:45 2024] [ 889451] 999 889451 1331028 526146 8589312 94304 0 unitd
[Mon Sep 30 17:57:45 2024] [ 889500] 10123 889500 105303 8392 258048 1920 0 unitd
[Mon Sep 30 17:57:45 2024] [ 313392] 10123 313392 146177 67011 872448 1394 0 unitd
[Mon Sep 30 17:57:45 2024] [ 313395] 10123 313395 152498 71714 917504 1394 0 unitd
[Mon Sep 30 17:57:45 2024] [ 313998] 10123 313998 144977 63637 835584 1458 0 unitd
[Mon Sep 30 17:57:45 2024] [ 313999] 10123 313999 144980 60023 819200 1394 0 unitd
[Mon Sep 30 17:57:45 2024] [ 314000] 10123 314000 147417 64938 864256 1522 0 unitd
[Mon Sep 30 17:57:45 2024] [ 314001] 10123 314001 158391 76988 958464 1394 0 unitd
[Mon Sep 30 17:57:45 2024] [ 314002] 10123 314002 145239 56459 802816 1490 0 unitd
[Mon Sep 30 17:57:45 2024] [ 314003] 10123 314003 126524 55219 786432 1458 0 unitd
[Mon Sep 30 17:57:45 2024] [ 314004] 10123 314004 146958 60356 819200 1362 0 unitd
[Mon Sep 30 17:57:45 2024] [ 314005] 10123 314005 144362 56037 806912 1458 0 unitd
[Mon Sep 30 17:57:45 2024] Memory cgroup out of memory: Killed process 889451 (unitd) total-vm:5324112kB, anon-rss:2089648kB, file-rss:4736kB, shmem-rss:10200kB, UID:999 pgtables:8388kB oom_score_adj:0
We're not as concerned with the OOM events, we can prevent, or at least limit the frequency of those by either raising memory limits (not ideal) or, as you suggested, by adjusting OOM scores for router processes.
But it would still be good to improve unit's recovery in events like this - we've not encountered any other issues so far but it's likely not impossible for the router process to crash or get killed in some other way. Alternatively it'd also be useful if we could limit how much memory the router process is allowed to use, which would make it easier to predict what kinds of limits we can place on containers.
Here's a minimal reproducible example for the Address already in use
issue:
- Unit config
{
"listeners": {
"unix:/tmp/listener.sock": {
"pass": "applications/php"
}
},
"applications": {
"php": {
"type": "php",
"processes": 4,
"index": "index.php",
"script": "index.php",
"root": "/var/www/"
}
}
}
- PHP script
<?php
echo "Hello, World!";
?>
- Docker command
docker run --rm -ti --name unit-php-app -p 8080:8080 -v $(pwd)/test-tmp:/tmp -v $(pwd)/unit.json:/docker-entrypoint.d/unit-config.json -v $(pwd)/index.php:/var/www/index.php unit:php8.2
- Reproduction steps
$ docker top unit-php-app
UID PID PPID C STIME TTY TIME CMD
root 8747 8726 0 10:42 pts/0 00:00:00 unit: main v1.33.0 [unitd --no-daemon --control unix:/var/run/control.unit.sock]
nomad 8824 8747 0 10:42 pts/0 00:00:00 unit: controller
nomad 8825 8747 0 10:42 pts/0 00:00:00 unit: router
nomad 8826 8747 0 10:42 pts/0 00:00:00 unit: "php" prototype
nomad 8827 8826 0 10:42 pts/0 00:00:00 unit: "php" application
nomad 8828 8826 0 10:42 pts/0 00:00:00 unit: "php" application
nomad 8829 8826 0 10:42 pts/0 00:00:00 unit: "php" application
nomad 8830 8826 0 10:42 pts/0 00:00:00 unit: "php" application
$ sudo kill -9 8825
$ docker logs unit-php-app 2>&1 | grep alert
2024/10/03 09:43:17 [alert] 1#1 bind(\"unix:/tmp/listener.sock\") failed (98: Address already in use)
2024/10/03 09:43:17 [alert] 78#78 failed to apply new conf
2024/10/03 09:43:17 [alert] 55#55 failed to apply previous configuration