wsgidav process not stopping
laur89 opened this issue · 4 comments
Describe the bug
Note this is quite possibly not an issue with wsgidav itself, but seafdav - seafile project's webdav implementation that relies on wsgidav.
There are cases where upon shutting down the service wsgidav child processes still hang around, causing subsequent restart of seafile to fail.
It seems to happen only if webdav server has actually been used prior to stopping. If service is merely started and immediately stopped, all processes appear to shut down OK.
Looking at seafile codebase, it appears the wsgidav process is started like this:
char *argv[] = {
(char *)get_python_executable(),
"-m", "wsgidav.server.server_cli",
"--server", "gunicorn",
"--root", "/",
"--log-file", seafdav_log_file,
"--pid", ctl->pidfile[PID_SEAFDAV],
"--port", port,
"--host", conf.host,
NULL
};
pid = spawn_process (argv, true);
...and stopped like this:
kill_by_force(PID_SEAFDAV);
/-/
static void
kill_by_force (int which)
{
if (which < 0 || which >= N_PID)
return;
char *pidfile = ctl->pidfile[which];
int pid = read_pid_from_pidfile(pidfile);
if (pid > 0) {
// if SIGKILL send success, then remove related pid file
if (kill ((pid_t)pid, SIGKILL) == 0) {
g_unlink (pidfile);
}
}
}
Note they're sending SIGKILL, so not quite sure why any process would remain hanging at all. Although unsure why SIGKILL is sent as the default signal in the first place.
To Reproduce
- Start seafile (that also spawns the wsgidav process)
- Use the webdav server (e.g. sync some files via a client)
- Stop seafile services (via packaged shell-script:
$ seafile.sh stop
- Note some wsgidav processes remain
Expected behavior
All processes spawned by seafile, including wsgidav ones, should be shut down.
Environment:
WsgiDAV/4.3.0 Python/3.10.12 Linux-6.1.106-Unraid-x86_64-with-glibc2.35
Additional context/longer repro example
After starting seafile, this can be seen in seafile-controller (that's spawning wsgidav process) log:
2024-10-07 01:07:01 seafile-controller.c(427): pid file /seafile/pids/seafdav.pid does not exist
2024-10-07 01:07:01 seafile-controller.c(506): seafdav need restart...
2024-10-07 01:07:01 seafile-controller.c(82): spawn_process: /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
2024-10-07 01:07:01 seafile-controller.c(116): spawned /usr/bin/python3, pid 159
These are the spawned wsgidav processes as seen from the running container (note pid 159
is tracked by seafdav as service pid):
$ ps -ef | grep wsgidav.server.server_cli
root 159 64 0 01:07 ? 00:00:01 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root 161 159 0 01:07 ? 00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root 162 159 0 01:07 ? 00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root 163 159 0 01:07 ? 00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root 164 159 0 01:07 ? 00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root 165 159 0 01:07 ? 00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
Now webdav server was used by an Android client, some I/O was performed.
Stopping the seafile server is done via a shell-script. From what is relevant, it performs two steps:
- first sends SIGTERM to seafile-controller process...:
pkill -SIGTERM -f "seafile-controller -c ${default_ccnet_conf_dir}"
This signal is caught by the signal handler, which in turn sends SIGKILL to wsgi process (in this case, that'd be to PID 159
)
- ...then itself sends SIGTERM to wsgidav process:
pkill -f "wsgidav.server.server_cli"
Excerpt from relevant location of said shell-script (sry, cannot find the seafile repo that contains this script:
function stop_seafile_server () {
echo "Stopping seafile server ..."
pkill -SIGTERM -f "seafile-controller -c ${default_ccnet_conf_dir}" # !!! 1st step
kill_all
return 0
}
function kill_all () {
pkill -f "seaf-server -c ${default_ccnet_conf_dir}"
pkill -f "fileserver -c ${default_ccnet_conf_dir}"
pkill -f "seafevents.main"
pkill -f "wsgidav.server.server_cli" # !!! 2nd step
pkill -f "notification-server -c ${central_config_dir}"
pkill -f "seafile-monitor.sh"
}
After this following 4 processes still remain hanging about:
$ ps -ef | grep wsgidav.server.server_cli
root 161 1 2 01:07 ? 00:01:01 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root 162 1 0 01:07 ? 00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root 164 1 0 01:07 ? 00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root 165 1 0 01:07 ? 00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
I suppose my question is whether this is expected and is the wsgidav service shutdown performed correctly by seafile?
Trying to kill the processes via another SIGTERM (i.e. default signal sent by pkill
) does nothing, yet sending SIGKILL or SIGHUP appears to get rid of 'em:
$ pkill --signal SIGHUP -f 'wsgidav.server.server_cli'
No idea what's up with that or whether it's safe to do so. Grepped wsgidav
codebase and cannot find any signal handlers whatsoever, so no idea why SIGHUP works.
My guess would be the issue is that the SIGKILL sent by the controller is targeted at the parent process, so it doesn't have a chance to gracefully shut down the child processes. But that's just a speculation. Nope that's not it. Sending SIGTERM to just the parent process only causes one of the child (!) processes to be nuked:
# prior to kill:
$ ps -ef | grep wsgidav.server.server_cli
root 5476 5400 0 11:58 ? 00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root 5478 5476 0 11:58 ? 00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root 5479 5476 0 11:58 ? 00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root 5480 5476 0 11:58 ? 00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root 5481 5476 0 11:58 ? 00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root 5482 5476 0 11:58 ? 00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
$ kill 5476
$ ps -ef | grep wsgidav.server.server_cli
root 5476 5400 0 11:58 ? 00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root 5478 5476 0 11:58 ? 00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root 5479 5476 0 11:58 ? 00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root 5480 5476 0 11:58 ? 00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root 5482 5476 0 11:58 ? 00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
Note PID 5476
(the parent process launched by controller) is still running, only 5481
got killed.
I don't know much about Seafile, but I tried this
Install WsgiDAV
cd test_wsgidav
pipenv install wsgidav gunicorn
pipenv shell
Create a wsgidav.yaml file with the following content:
server: gunicorn
server_args:
workers: 5
host: 0.0.0.0
port: 8080
provider_mapping:
"/": "."
Run WsgiDAV
test_wsgidav) ➜ test_wsgidav wsgidav --auth anonymous
Using default configuration file: /Users/martin/prj/git/test_wsgidav/wsgidav.yaml
...
21:30:35.543 - INFO : Running WsgiDAV/4.3.3 gunicorn/23.0.0 Python/3.12.0 ...
[2024-10-07 21:30:35 +0200] [70339] [INFO] Starting gunicorn 23.0.0
[2024-10-07 21:30:35 +0200] [70339] [INFO] Listening at: http://0.0.0.0:8080 (70339)
[2024-10-07 21:30:35 +0200] [70339] [INFO] Using worker: gthread
[2024-10-07 21:30:35 +0200] [70342] [INFO] Booting worker with pid: 70342
[2024-10-07 21:30:35 +0200] [70343] [INFO] Booting worker with pid: 70343
[2024-10-07 21:30:35 +0200] [70344] [INFO] Booting worker with pid: 70344
[2024-10-07 21:30:35 +0200] [70345] [INFO] Booting worker with pid: 70345
[2024-10-07 21:30:35 +0200] [70346] [INFO] Booting worker with pid: 70346
We can see that gunicorn starts five other processes, as configured.
Then open a second terminal and find the processes i.e. not the spawned process 51684:
➜ test_wsgidav ps -ef | grep wsgidav
501 70339 70173 0 9:30pm ttys003 0:00.21 /Library/Frameworks/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python /Users/martin/prj/git/test_wsgidav/.venv/bin/wsgidav --auth anonymous
501 70342 70339 0 9:30pm ttys003 0:00.12 /Library/Frameworks/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python /Users/martin/prj/git/test_wsgidav/.venv/bin/wsgidav --auth anonymous
501 70343 70339 0 9:30pm ttys003 0:00.11 /Library/Frameworks/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python /Users/martin/prj/git/test_wsgidav/.venv/bin/wsgidav --auth anonymous
501 70344 70339 0 9:30pm ttys003 0:00.12 /Library/Frameworks/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python /Users/martin/prj/git/test_wsgidav/.venv/bin/wsgidav --auth anonymous
501 70345 70339 0 9:30pm ttys003 0:00.12 /Library/Frameworks/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python /Users/martin/prj/git/test_wsgidav/.venv/bin/wsgidav --auth anonymous
501 70346 70339 0 9:30pm ttys003 0:00.13 /Library/Frameworks/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python /Users/martin/prj/git/test_wsgidav/.venv/bin/wsgidav --auth anonymous
now and stop the root process with SIGINT:
kill -s INT 70339
In the main terminal we see that the spawned processes are also stopped:
...
[2024-10-07 21:36:28 +0200] [70339] [INFO] Handling signal: int
[2024-10-07 21:36:28 +0200] [70342] [INFO] Worker exiting (pid: 70342)
[2024-10-07 21:36:28 +0200] [70343] [INFO] Worker exiting (pid: 70343)
[2024-10-07 21:36:28 +0200] [70344] [INFO] Worker exiting (pid: 70344)
[2024-10-07 21:36:28 +0200] [70345] [INFO] Worker exiting (pid: 70345)
[2024-10-07 21:36:28 +0200] [70346] [INFO] Worker exiting (pid: 70346)
[2024-10-07 21:36:28 +0200] [70339] [INFO] Shutting down: Master
➜ test_wsgidav
So it looks like it is working as expected?
Thanks for getting back so quick.
Looks like SIGINT
works even with those hanging gunicorn processes. Note in original post I described how SIGTERM
does nothing, but replacing it for SIGINT
does the trick:
root@1d53611e14f4:/seafile# ps -ef | grep -v grep | grep wsgidav
root 1404 1 0 16:53 ? 00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root 1405 1 0 16:53 ? 00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root 1406 1 0 16:53 ? 00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root 1407 1 0 16:53 ? 00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root 1408 1 0 16:53 ? 00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root@1d53611e14f4:/seafile# pkill --signal SIGINT -f 'wsgidav.server.server_cli'
root@1d53611e14f4:/seafile# echo $?
0
root@1d53611e14f4:/seafile# ps -ef | grep -v grep | grep wsgidav
Is it possibly due to gunicorn itself handling INT, but not TERM signals?
At any rate, think I'll propose Seafile team to:
- stop SIGKILLing processes as the first step;
- consider SIGINT-ing webdav as opposed to TERM-ing
Although INT is a bit weird signal to send in this case, as afaik it's supposed to be keyboard/user interrupt, i.e. implies interactivity, not one system interrupting another.
Worth noting following your example using version 4.3.3
I'm unable to reproduce the conditions where some child processes hang around. Killing via both TERM & KILL signals always result in all processes being reaped. Unsure what's going on under Seafile.