Unix domain socket not cleaned up if the router process is not terminated cleanly

Question

Unix domain socket not cleaned up if the router process is not terminated cleanly

Opened this issue 3 months ago · 4 comments

Hello,

We're running into an issue with nginx-unit, which is mostly caused by OOM-killer. Unit is running in a Docker container, and we have fairly strict memory and CPU constraints configured for it, which we don't want to remove. If a process in the container tries to allocate more memory than cgroup limits allow, OOM killer steps in and sends a SIGKILL signal to a (possibly random, haven't confirmed) process in the container/cgroup. If it kills the "router" process, then unit is unable to recover from that, returning the bind(\"unix:/tmp/app-listener.unit.sock\") failed (98: Address already in use) error when it starts up again (previously discussed in #669 and a few other issues).

It'd be great if unit was able to recover gracefully from failures like this. We're currently testing the following patch which removes the socket if it already exists, before binding to it. This does work but not sure if it's a good idea:

diff --git a/src/nxt_main_process.c b/src/nxt_main_process.c
index 060ead41..a59d5703 100644
--- a/src/nxt_main_process.c
+++ b/src/nxt_main_process.c
@@ -1184,6 +1184,16 @@ nxt_main_listening_socket(nxt_sockaddr_t *sa, nxt_listening_socket_t *ls)
     const socklen_t   length = sizeof(int);
     static const int  enable = 1;
 
+    if (sa != NULL && sa->u.sockaddr.sa_family == AF_UNIX && sa->u.sockaddr_un.sun_path[0] != '\0') {
+        char *filename;
+        filename = sa->u.sockaddr_un.sun_path;
+
+        struct stat buffer;
+        if (stat(filename, &buffer) == 0) {
+            unlink(filename);
+        }
+    }
+
     s = socket(sa->u.sockaddr.sa_family, sa->type, 0);
 
     if (nxt_slow_path(s == -1)) {

Reproduction steps/example (it's also reproducible on 1.33.0):

# docker top app
UID                 PID                 PPID                C                   STIME               TTY                 TIME                CMD
root                90925               90904               0                   13:25               ?                   00:00:00            unit: main v1.32.1 [/usr/sbin/unitd --no-daemon --control unix:/nginx-unit/control.unit.sock]
systemd+            90981               90925               0                   13:25               ?                   00:00:00            unit: controller
systemd+            90982               90925               0                   13:25               ?                   00:00:00            unit: router
1000009+            91380               90925               0                   13:26               ?                   00:00:00            unit: "app-test-app" prototype
1000009+            91381               91380               31                  13:26               ?                   00:00:00            unit: "app-test-app" application

# kill -9 90982

# docker top app
UID                 PID                 PPID                C                   STIME               TTY                 TIME                CMD
root                90925               90904               0                   13:25               ?                   00:00:00            unit: main v1.32.1 [/usr/sbin/unitd --no-daemon --control unix:/nginx-unit/control.unit.sock]
systemd+            90981               90925               0                   13:25               ?                   00:00:00            unit: controller
systemd+            91397               90925               0                   13:26               ?                   00:00:00            unit: router

# docker logs app 2>&1 | grep alert
2024/10/01 13:26:33 [alert] 1#1 process 36 exited on signal 9
2024/10/01 13:26:33 [alert] 1#1 sendmsg(10, -1, -1, 1) failed (32: Broken pipe)
2024/10/01 13:26:33 [alert] 1#1 bind(\"unix:/tmp/app-listener.unit.sock\") failed (98: Address already in use)
2024/10/01 13:26:33 [alert] 43#43 failed to apply new conf
2024/10/01 13:26:33 [alert] 35#35 failed to apply previous configuration

I'm wondering if there's a better workaround for this issue and/or if this is a bug that you're open to addressing in the future?

Answer 1 · 2024-10-01T15:48:38.000Z

Hi, yes, this is something that has come up before.

IIRC we have generally discounted simply trying to remove the socket at startup, for $reasons.

However, here are some other things you may consider

OK, so this may seem pretty obvious and there may be a reason you don't/can't do this, but you could simply remove the socket before running Unit...

If you are restarting the docker container when this happens then if you create the UNIX domain socket under a tmpfs(5) file-system (like /tmp or /run on Linux) then it will be automatically removed.

Reducing the likelihood/preventing the router process from being OOM-killed.

For example to make the router process exempt from OOM-kills

# echo -1000 >/proc/$(pidof "unit: router")/oom_score_adj

Maybe also include the main and controller processes, though depending on your use case, the router process may indeed grow quite large...

See here for more details.

Unit supports abstract UNIX domain sockets (Linux only).

These are like virtual socket files and will be automatically cleaned up.

Specify it like unix:@abstract_socket (note the @ symbol...)

This may be an option if your client supports connecting to such a thing, e.g with the --abstract-unix-socket option in curl(1).

See the unix(7) man-page for more details.

Answer 2 · 2024-10-02T16:31:06.000Z

Thanks for the suggestions! We're considering using abstract sockets but they're unfortunately not supported in Nginx, or at least not well, so it looks like we would have to patch support for those into it ourselves as well which is not ideal. I noticed there's a few patches in mailing lists and Openresty issues floating around...

Regarding restarting the container/removing the socket on startup to fix the issue - the app's environment is actually rebuilt entirely if the container's restarted so the socket does get cleaned up in that case. However, when the described issues occurs, Unit ends up in a bad state but continues running. We would actually prefer if it completely exited as that would then let the app recover automatically.

I'm also wondering what contributes to the high memory usage of the router - does it increase with the number of open/pending connections or are there other factors (our config is relatively simple)? And do you know if the backlog option, that was added in the previous release, would help with managing its memory usage?

Answer 3 · 2024-10-02T17:25:13.000Z

Regarding restarting the container/removing the socket on startup to fix the issue - the app's environment is actually rebuilt entirely if the container's restarted so the socket does get cleaned up in that case. However, when the described issues occurs, Unit ends up in a bad state but continues running. We would actually prefer if it completely exited as that would then let the app recover automatically.

It would probably help to know what is actually causing the OOM situation.

If it's something unrelated to Unit, then you could simply exempt Unit (the whole thing, all the unit processes) from being OOM-killed.

If it's an application running under Unit, then increase its likelihood of being OOM-killed. If it is, Unit will re-spawn it.

OK, ignore the below, it was using a regular TCP socket.

~~Hmm, if you kill the router process then it's also re-spawned (along with applications)~~

~~So now I'm wondering how exactly you're getting into the situation you are... maybe it's nothing to do with Unit and the OOM condition is still happening...~~

I'm also wondering what contributes to the high memory usage of the router - does it increase with the number of open/pending connections or are there other factors (our config is relatively simple)? And do you know if the backlog option, that was added in the previous release, would help with managing its memory usage?

Unit may buffer request data. You'll see this when running it as a proxy. If you try and load a large file into an application, it may get buffered, exact behaviour and where the buffering occurs will be somewhat dependent on language module being used (and also the application itself).

Answer 4 · 2024-10-03T10:22:24.000Z

I'll paste one example of OOM killer's output when it killed the router process below. OOM killer was invoked because we have a memory limit (3GB) set on the Docker container that unit is running in. There's nothing else running in the container and OOM killer wasn't invoked due to high memory pressure on the host. It killed the process with PID 889451 (unitd router) which was using approximately 2GB of memory when it was killed. The worker processes (PHP application) were using very little memory in comparison, so it's unlikely that those caused the issue. I'm assuming that the router process simply started using more memory because of a traffic spike that occurred around the same time.

[Mon Sep 30 17:57:45 2024] Tasks state (memory values in pages):
[Mon Sep 30 17:57:45 2024] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[Mon Sep 30 17:57:45 2024] [ 889388]     0 889388     4286     1152    61440        0             0 unitd
[Mon Sep 30 17:57:45 2024] [ 889450]   999 889450     2676      447    57344       32             0 unitd
[Mon Sep 30 17:57:45 2024] [ 889451]   999 889451  1331028   526146  8589312    94304             0 unitd
[Mon Sep 30 17:57:45 2024] [ 889500] 10123 889500   105303     8392   258048     1920             0 unitd
[Mon Sep 30 17:57:45 2024] [ 313392] 10123 313392   146177    67011   872448     1394             0 unitd
[Mon Sep 30 17:57:45 2024] [ 313395] 10123 313395   152498    71714   917504     1394             0 unitd
[Mon Sep 30 17:57:45 2024] [ 313998] 10123 313998   144977    63637   835584     1458             0 unitd
[Mon Sep 30 17:57:45 2024] [ 313999] 10123 313999   144980    60023   819200     1394             0 unitd
[Mon Sep 30 17:57:45 2024] [ 314000] 10123 314000   147417    64938   864256     1522             0 unitd
[Mon Sep 30 17:57:45 2024] [ 314001] 10123 314001   158391    76988   958464     1394             0 unitd
[Mon Sep 30 17:57:45 2024] [ 314002] 10123 314002   145239    56459   802816     1490             0 unitd
[Mon Sep 30 17:57:45 2024] [ 314003] 10123 314003   126524    55219   786432     1458             0 unitd
[Mon Sep 30 17:57:45 2024] [ 314004] 10123 314004   146958    60356   819200     1362             0 unitd
[Mon Sep 30 17:57:45 2024] [ 314005] 10123 314005   144362    56037   806912     1458             0 unitd
[Mon Sep 30 17:57:45 2024] Memory cgroup out of memory: Killed process 889451 (unitd) total-vm:5324112kB, anon-rss:2089648kB, file-rss:4736kB, shmem-rss:10200kB, UID:999 pgtables:8388kB oom_score_adj:0

We're not as concerned with the OOM events, we can prevent, or at least limit the frequency of those by either raising memory limits (not ideal) or, as you suggested, by adjusting OOM scores for router processes.

But it would still be good to improve unit's recovery in events like this - we've not encountered any other issues so far but it's likely not impossible for the router process to crash or get killed in some other way. Alternatively it'd also be useful if we could limit how much memory the router process is allowed to use, which would make it easier to predict what kinds of limits we can place on containers.

Here's a minimal reproducible example for the Address already in use issue:

Unit config

{
  "listeners": {
    "unix:/tmp/listener.sock": {
      "pass": "applications/php"
    }
  },
  "applications": {
    "php": {
      "type": "php",
      "processes": 4,
      "index": "index.php",
      "script": "index.php",
      "root": "/var/www/"
    }
  }
}

PHP script

<?php
echo "Hello, World!";
?>

Docker command

docker run --rm -ti --name unit-php-app -p 8080:8080 -v $(pwd)/test-tmp:/tmp -v $(pwd)/unit.json:/docker-entrypoint.d/unit-config.json -v $(pwd)/index.php:/var/www/index.php unit:php8.2

Reproduction steps

$ docker top unit-php-app
UID                 PID                 PPID                C                   STIME               TTY                 TIME                CMD
root                8747                8726                0                   10:42               pts/0               00:00:00            unit: main v1.33.0 [unitd --no-daemon --control unix:/var/run/control.unit.sock]
nomad               8824                8747                0                   10:42               pts/0               00:00:00            unit: controller
nomad               8825                8747                0                   10:42               pts/0               00:00:00            unit: router
nomad               8826                8747                0                   10:42               pts/0               00:00:00            unit: "php" prototype
nomad               8827                8826                0                   10:42               pts/0               00:00:00            unit: "php" application
nomad               8828                8826                0                   10:42               pts/0               00:00:00            unit: "php" application
nomad               8829                8826                0                   10:42               pts/0               00:00:00            unit: "php" application
nomad               8830                8826                0                   10:42               pts/0               00:00:00            unit: "php" application

$ sudo kill -9 8825

$ docker logs unit-php-app 2>&1 | grep alert
2024/10/03 09:43:17 [alert] 1#1 bind(\"unix:/tmp/listener.sock\") failed (98: Address already in use)
2024/10/03 09:43:17 [alert] 78#78 failed to apply new conf
2024/10/03 09:43:17 [alert] 55#55 failed to apply previous configuration