kiwix-serve leaves zombies if killed (was: Does kiwix-serve --monitoringLibrary uses too much resources?)
Closed this issue · 20 comments
We have rolled out this option on library.kiwix.org
and this is now really slow. We are not sure this is related, but we are suspicious. CPU seems good but overall load is super high
$ ps aux | grep kiwix-serve
root 1435 0.0 0.0 0 0 ? Z Feb21 2:04 [kiwix-serve] <defunct>
root 2157 0.0 0.0 0 0 ? Z Feb24 0:03 [kiwix-serve] <defunct>
root 2417 0.0 0.0 0 0 ? Z Feb24 0:11 [kiwix-serve] <defunct>
root 3527 0.0 0.0 0 0 ? Z Feb26 0:33 [kiwix-serve] <defunct>
root 3656 0.0 0.0 0 0 ? Z Feb24 0:03 [kiwix-serve] <defunct>
root 4185 0.0 0.0 0 0 ? Z Feb24 0:07 [kiwix-serve] <defunct>
root 4340 0.1 0.0 0 0 ? Z Feb23 8:09 [kiwix-serve] <defunct>
root 5053 0.0 0.0 0 0 ? Z Feb23 4:25 [kiwix-serve] <defunct>
root 5606 0.0 0.0 220 0 ? Ss Jan31 0:00 /usr/bin/dumb-init -- kiwix-serve-folder /data
root 5979 0.5 3.3 43307740 2218284 ? Ssl Jan31 212:42 kiwix-serve --library --monitorLibrary /tmp/library.xml
root 6002 0.0 0.0 0 0 ? Z Jan31 0:00 [kiwix-serve-fol] <defunct>
root 6680 0.0 0.0 0 0 ? Z Feb23 5:14 [kiwix-serve] <defunct>
root 7740 0.0 0.0 0 0 ? Z Feb25 0:06 [kiwix-serve] <defunct>
root 8432 0.0 0.0 0 0 ? Z Feb23 1:59 [kiwix-serve] <defunct>
root 8568 0.0 0.0 0 0 ? Z Feb25 3:15 [kiwix-serve] <defunct>
root 9381 0.0 0.0 0 0 ? Z Feb23 1:04 [kiwix-serve] <defunct>
root 10692 1.2 0.0 0 0 ? Z 08:29 0:16 [kiwix-serve] <defunct>
root 12333 0.0 0.0 0 0 ? Z Feb24 2:06 [kiwix-serve] <defunct>
root 12432 0.0 0.0 0 0 ? Z Feb24 2:42 [kiwix-serve] <defunct>
root 12676 0.0 0.0 0 0 ? Z Feb25 0:07 [kiwix-serve] <defunct>
root 13696 0.2 0.0 0 0 ? Z Feb25 12:47 [kiwix-serve] <defunct>
root 15056 0.0 0.0 0 0 ? Z 04:16 0:07 [kiwix-serve] <defunct>
root 15137 0.0 0.0 0 0 ? Z Feb27 0:03 [kiwix-serve] <defunct>
root 15384 0.0 0.0 0 0 ? Z Feb27 0:04 [kiwix-serve] <defunct>
root 15941 0.0 0.0 0 0 ? Z Feb27 0:25 [kiwix-serve] <defunct>
root 15957 0.0 0.0 0 0 ? Z Feb21 0:05 [kiwix-serve] <defunct>
root 16407 0.0 0.0 0 0 ? Z Feb27 0:49 [kiwix-serve] <defunct>
root 16886 0.0 0.0 0 0 ? Z Feb26 0:07 [kiwix-serve] <defunct>
root 17830 0.0 0.0 0 0 ? Z Feb26 1:14 [kiwix-serve] <defunct>
root 17999 0.1 0.0 0 0 ? Z Feb23 10:31 [kiwix-serve] <defunct>
root 18212 0.0 0.0 0 0 ? Z Feb27 0:04 [kiwix-serve] <defunct>
root 18309 0.0 0.0 0 0 ? Z Feb27 0:04 [kiwix-serve] <defunct>
root 18464 0.0 0.0 0 0 ? Z Feb27 0:04 [kiwix-serve] <defunct>
root 18570 0.0 0.0 0 0 ? Z Feb27 0:19 [kiwix-serve] <defunct>
root 18753 0.1 0.0 0 0 ? Z Feb27 2:16 [kiwix-serve] <defunct>
root 19434 3.8 0.4 2230528 286528 ? Sl 08:50 0:04 kiwix-serve --daemon --port=8000 --library --monitorLibrary --threads=16 --nodatealias /var/www/library.kiwix.org/library.kiwix.org.xml
kelson 20185 0.0 0.0 12752 952 pts/1 S+ 08:52 0:00 grep kiwix-serve
root 20218 0.0 0.0 0 0 ? Z Feb25 0:04 [kiwix-serve] <defunct>
root 21295 0.3 0.0 0 0 ? Z Feb22 30:13 [kiwix-serve] <defunct>
root 21363 0.0 0.0 0 0 ? Z Feb27 0:04 [kiwix-serve] <defunct>
root 21743 0.3 0.0 0 0 ? Z Feb27 6:42 [kiwix-serve] <defunct>
root 21912 0.1 0.0 0 0 ? Z Feb27 0:55 [kiwix-serve] <defunct>
root 22078 0.0 0.0 0 0 ? Z Feb25 0:03 [kiwix-serve] <defunct>
root 22452 0.0 0.0 0 0 ? Z Feb27 0:03 [kiwix-serve] <defunct>
root 22687 0.0 0.0 0 0 ? Z Feb25 2:24 [kiwix-serve] <defunct>
root 22739 0.2 0.0 0 0 ? Z Feb27 1:33 [kiwix-serve] <defunct>
root 23013 0.0 0.0 0 0 ? Z Feb25 0:09 [kiwix-serve] <defunct>
root 23647 0.0 0.0 0 0 ? Z Feb25 0:03 [kiwix-serve] <defunct>
root 23764 0.0 0.0 0 0 ? Z Feb27 0:13 [kiwix-serve] <defunct>
root 23816 0.0 0.0 0 0 ? Z Feb25 0:21 [kiwix-serve] <defunct>
root 23915 0.1 0.0 0 0 ? Z Feb26 3:08 [kiwix-serve] <defunct>
root 23955 0.0 0.0 0 0 ? Z Feb27 0:01 [kiwix-serve] <defunct>
root 24189 0.0 0.0 0 0 ? Z Feb26 0:02 [kiwix-serve] <defunct>
root 24624 0.0 0.0 0 0 ? Z Feb25 0:51 [kiwix-serve] <defunct>
root 25329 0.0 0.0 0 0 ? Z Feb24 0:06 [kiwix-serve] <defunct>
root 25681 0.0 0.0 0 0 ? Z Feb26 1:38 [kiwix-serve] <defunct>
root 26777 0.0 0.0 0 0 ? Z Feb25 0:06 [kiwix-serve] <defunct>
root 27225 0.0 0.0 0 0 ? Z Feb25 3:02 [kiwix-serve] <defunct>
root 27407 0.4 0.0 0 0 ? Z Feb27 4:40 [kiwix-serve] <defunct>
root 27626 0.1 0.0 0 0 ? Z Feb23 14:56 [kiwix-serve] <defunct>
root 28213 0.0 0.0 0 0 ? Z Feb21 3:00 [kiwix-serve] <defunct>
root 28293 0.0 0.0 0 0 ? Z Feb27 0:03 [kiwix-serve] <defunct>
root 28335 0.0 0.0 0 0 ? Z Feb25 1:01 [kiwix-serve] <defunct>
root 28454 0.0 0.0 0 0 ? Z Feb27 0:04 [kiwix-serve] <defunct>
root 28791 0.0 0.0 0 0 ? Z Feb27 0:17 [kiwix-serve] <defunct>
root 28840 0.0 0.0 0 0 ? Z Feb27 0:06 [kiwix-serve] <defunct>
root 29214 0.0 0.0 0 0 ? Z Feb27 0:03 [kiwix-serve] <defunct>
root 29332 0.0 0.0 0 0 ? Z 02:18 0:07 [kiwix-serve] <defunct>
root 29557 0.0 0.0 0 0 ? Z Feb27 0:07 [kiwix-serve] <defunct>
root 29966 0.0 0.0 0 0 ? Z 03:16 0:03 [kiwix-serve] <defunct>
root 30199 0.0 0.0 0 0 ? Z Feb25 0:42 [kiwix-serve] <defunct>
root 30322 0.0 0.0 0 0 ? Z Feb27 0:03 [kiwix-serve] <defunct>
root 30518 0.0 0.0 0 0 ? Z 03:18 0:06 [kiwix-serve] <defunct>
root 30606 0.0 0.0 0 0 ? Z Feb27 0:03 [kiwix-serve] <defunct>
root 30895 0.0 0.0 0 0 ? Z Feb27 0:06 [kiwix-serve] <defunct>
root 30988 0.0 0.0 0 0 ? Z 03:20 0:07 [kiwix-serve] <defunct>
root 31396 0.0 0.0 0 0 ? Z 03:22 0:04 [kiwix-serve] <defunct>
root 31499 0.0 0.0 0 0 ? Z Feb25 0:03 [kiwix-serve] <defunct>
root 31674 0.0 0.0 0 0 ? Z 02:25 0:10 [kiwix-serve] <defunct>
root 31812 0.0 0.0 0 0 ? Z 03:24 0:07 [kiwix-serve] <defunct>
root 31834 0.0 0.0 0 0 ? Z Feb27 0:04 [kiwix-serve] <defunct>
root 31960 0.0 0.0 0 0 ? Z Feb25 0:03 [kiwix-serve] <defunct>
root 32141 0.0 0.0 0 0 ? Z Feb27 0:06 [kiwix-serve] <defunct>
root 32237 0.0 0.0 0 0 ? Z Feb25 0:03 [kiwix-serve] <defunct>
root 32511 0.0 0.0 0 0 ? Z Feb27 0:12 [kiwix-serve] <defunct>
root 32575 0.0 0.0 0 0 ? Z Feb25 0:08 [kiwix-serve] <defunct>
root 32763 0.0 0.0 0 0 ? Z Feb27 0:03 [kiwix-serve] <defunct>
root 32812 0.0 0.0 0 0 ? Z 03:28 0:03 [kiwix-serve] <defunct>
root 32980 0.0 0.0 0 0 ? Z 03:29 0:04 [kiwix-serve] <defunct>
root 33311 0.0 0.0 0 0 ? Z 04:19 0:06 [kiwix-serve] <defunct>
root 33375 0.0 0.0 0 0 ? Z 03:31 0:12 [kiwix-serve] <defunct>
root 33380 0.2 0.0 0 0 ? Z Feb25 7:47 [kiwix-serve] <defunct>
root 33464 0.0 0.0 0 0 ? Z Feb27 0:08 [kiwix-serve] <defunct>
root 33520 0.0 0.0 0 0 ? Z Feb27 0:33 [kiwix-serve] <defunct>
root 33791 0.1 0.0 0 0 ? Z 04:21 0:24 [kiwix-serve] <defunct>
root 34100 0.0 0.0 0 0 ? Z Feb25 0:04 [kiwix-serve] <defunct>
root 34272 0.0 0.0 0 0 ? Z Feb25 0:17 [kiwix-serve] <defunct>
root 35003 0.0 0.0 0 0 ? Z Feb25 0:51 [kiwix-serve] <defunct>
root 35182 0.0 0.0 0 0 ? Z 01:50 0:04 [kiwix-serve] <defunct>
root 35327 0.0 0.0 0 0 ? Z 03:39 0:03 [kiwix-serve] <defunct>
root 35700 0.0 0.0 0 0 ? Z 03:40 0:04 [kiwix-serve] <defunct>
root 35894 0.0 0.0 0 0 ? Z 01:52 0:09 [kiwix-serve] <defunct>
root 36334 0.0 0.0 0 0 ? Z 03:42 0:03 [kiwix-serve] <defunct>
root 36359 0.0 0.0 0 0 ? Z 02:37 0:04 [kiwix-serve] <defunct>
root 36391 0.0 0.0 0 0 ? Z Feb24 0:04 [kiwix-serve] <defunct>
root 36657 0.0 0.0 0 0 ? Z 03:43 0:07 [kiwix-serve] <defunct>
root 36755 0.0 0.0 0 0 ? Z Feb24 0:04 [kiwix-serve] <defunct>
root 36768 0.0 0.0 0 0 ? Z Feb26 0:03 [kiwix-serve] <defunct>
root 36989 0.0 0.0 0 0 ? Z 02:39 0:06 [kiwix-serve] <defunct>
root 37267 0.0 0.0 0 0 ? Z 03:46 0:04 [kiwix-serve] <defunct>
root 37304 0.0 0.0 0 0 ? Z Feb24 0:05 [kiwix-serve] <defunct>
root 37499 0.0 0.0 0 0 ? Z 00:51 0:03 [kiwix-serve] <defunct>
root 37511 0.0 0.0 0 0 ? Z Feb26 0:06 [kiwix-serve] <defunct>
root 37702 0.0 0.0 0 0 ? Z Feb24 0:09 [kiwix-serve] <defunct>
root 37733 0.0 0.0 0 0 ? Z 02:42 0:03 [kiwix-serve] <defunct>
root 37772 0.0 0.0 0 0 ? Z 00:52 0:02 [kiwix-serve] <defunct>
root 37808 0.0 0.0 0 0 ? Z 03:48 0:12 [kiwix-serve] <defunct>
root 37838 0.0 0.0 0 0 ? Z 01:56 0:12 [kiwix-serve] <defunct>
root 37862 0.0 0.0 0 0 ? Z Feb26 0:09 [kiwix-serve] <defunct>
root 38067 0.0 0.0 0 0 ? Z 00:54 0:03 [kiwix-serve] <defunct>
root 38205 0.0 0.0 0 0 ? Z 02:44 0:05 [kiwix-serve] <defunct>
root 38253 0.1 0.0 0 0 ? Z 00:55 0:31 [kiwix-serve] <defunct>
root 38409 0.0 0.0 0 0 ? Z Feb24 1:08 [kiwix-serve] <defunct>
root 38464 0.0 0.0 0 0 ? Z Feb26 0:19 [kiwix-serve] <defunct>
root 38882 0.0 0.0 0 0 ? Z 02:48 0:04 [kiwix-serve] <defunct>
root 39462 0.0 0.0 0 0 ? Z 02:50 0:04 [kiwix-serve] <defunct>
root 39720 0.0 0.0 0 0 ? Z Feb26 0:03 [kiwix-serve] <defunct>
root 39769 0.0 0.0 0 0 ? Z 02:52 0:07 [kiwix-serve] <defunct>
root 40105 0.0 0.0 0 0 ? Z 03:53 0:10 [kiwix-serve] <defunct>
root 40198 0.0 0.0 0 0 ? Z 02:55 0:12 [kiwix-serve] <defunct>
root 40470 0.0 0.0 0 0 ? Z Feb26 0:29 [kiwix-serve] <defunct>
root 40506 0.0 0.0 0 0 ? Z Feb22 0:41 [kiwix-serve] <defunct>
root 40566 0.0 0.0 0 0 ? Z Feb26 0:09 [kiwix-serve] <defunct>
root 40661 0.0 0.0 0 0 ? Z Feb26 0:04 [kiwix-serve] <defunct>
root 40772 0.0 0.0 0 0 ? Z 02:07 0:09 [kiwix-serve] <defunct>
root 40841 0.1 0.0 0 0 ? Z Feb26 2:23 [kiwix-serve] <defunct>
root 41031 0.0 0.0 0 0 ? Z Feb24 0:06 [kiwix-serve] <defunct>
root 41237 0.0 0.0 0 0 ? Z Feb26 0:06 [kiwix-serve] <defunct>
root 41278 0.0 0.0 0 0 ? Z 06:51 0:04 [kiwix-serve] <defunct>
root 41302 0.0 0.0 0 0 ? Z Feb24 1:37 [kiwix-serve] <defunct>
root 41637 0.1 0.0 0 0 ? Z 06:53 0:13 [kiwix-serve] <defunct>
root 41688 0.0 0.0 0 0 ? Z Feb26 0:03 [kiwix-serve] <defunct>
root 41865 0.0 0.0 0 0 ? Z Feb26 0:14 [kiwix-serve] <defunct>
root 42059 0.0 0.0 0 0 ? Z Feb26 0:47 [kiwix-serve] <defunct>
root 42237 0.0 0.0 0 0 ? Z 04:58 0:04 [kiwix-serve] <defunct>
root 42740 0.4 0.0 0 0 ? Z 05:01 1:02 [kiwix-serve] <defunct>
root 43091 0.0 0.0 0 0 ? Z Feb26 0:26 [kiwix-serve] <defunct>
root 43371 0.1 0.0 0 0 ? Z 07:03 0:10 [kiwix-serve] <defunct>
root 43786 0.0 0.0 0 0 ? Z Feb26 0:03 [kiwix-serve] <defunct>
root 44008 0.0 0.0 0 0 ? Z Feb22 0:03 [kiwix-serve] <defunct>
root 44095 0.0 0.0 0 0 ? Z 04:07 0:05 [kiwix-serve] <defunct>
root 44244 0.1 0.0 0 0 ? Z Feb22 9:35 [kiwix-serve] <defunct>
root 44326 0.0 0.0 0 0 ? Z Feb26 0:04 [kiwix-serve] <defunct>
root 44532 0.1 0.0 0 0 ? Z 07:10 0:07 [kiwix-serve] <defunct>
root 44638 0.0 0.0 0 0 ? Z 04:09 0:05 [kiwix-serve] <defunct>
root 44682 0.0 0.0 0 0 ? Z Feb26 0:24 [kiwix-serve] <defunct>
root 44789 0.0 0.0 0 0 ? Z Feb24 2:37 [kiwix-serve] <defunct>
root 45043 0.6 0.0 0 0 ? Z 07:13 0:37 [kiwix-serve] <defunct>
root 45311 0.0 0.0 0 0 ? Z 04:13 0:06 [kiwix-serve] <defunct>
root 45338 0.0 0.0 0 0 ? Z Feb24 0:04 [kiwix-serve] <defunct>
root 45850 0.0 0.0 0 0 ? Z Feb22 0:07 [kiwix-serve] <defunct>
root 46291 0.0 0.0 0 0 ? Z Feb22 2:19 [kiwix-serve] <defunct>
root 46385 0.0 0.0 0 0 ? Z Feb24 0:06 [kiwix-serve] <defunct>
root 46869 0.0 0.0 0 0 ? Z Feb24 0:03 [kiwix-serve] <defunct>
root 47155 0.0 0.0 0 0 ? Z Feb24 0:04 [kiwix-serve] <defunct>
root 47401 0.0 0.0 0 0 ? Z Feb26 0:38 [kiwix-serve] <defunct>
root 47590 0.0 0.0 0 0 ? Z Feb24 0:03 [kiwix-serve] <defunct>
root 47792 0.0 0.0 0 0 ? Z Feb24 0:09 [kiwix-serve] <defunct>
root 48492 0.0 0.0 0 0 ? Z Feb24 0:14 [kiwix-serve] <defunct>
After restarting the library.kiwix.org docker container, then the load was back to normal and all the defunct kiwix-serve process have vanished
Do you have an idea what happens? Where do those defunct processes come from? Do they correspond to threads handling concurrent requests? Did the kiwix-serve process run continuously since it was started or it was automatically restarted multiple times because of becoming unresponsive? Are there any logs?
@veloman-yunkan The problems seems to come from the kill
operation we rung here https://github.com/kiwix/maintenance/blob/master/library-docker/bin/restart-kiwix-serve.sh#L19.
I can not reproduce the problem locally, but I have tried directly on library.kiwix.org
and this really seems to be the root cause. I even have tried with 15 (TERM)
but doing this I still get zombies.
So, if we don't have an obvious performance problem (for the moment), we have a clear problem that it seems not possible to kill cleanly the process without leaving zombies.
@veloman-yunkan The problems seems to come from the
kill
operation we rung here https://github.com/kiwix/maintenance/blob/master/library-docker/bin/restart-kiwix-serve.sh#L19.
An immediate question is: why is a KILL signal used to stop kiwix-serve? Processes should be killed with a KILL signal only as a last resort.
@veloman-yunkan Sending SIGTERM leads to the same behaviour. What would you propose as alternative to request kiwix-serve daemon to stop?
@kelson42 SIGTERM must be used for terminating a process. Any problems with handling of SIGTERM must be debugged and fixed.
@veloman-yunkan Yesterday, I have rebuilded/relaunched the library container with latest kiwix-serve and used the opportunity to put a proper SIGTERM
signal, see:
https://github.com/kiwix/maintenance/pull/217/files
After restart I had to <defunct>
kiwix-serve process... but this morning, the first one have already appeared :(
$ ps aux | grep kiwix-serve
root 3607 0.4 0.0 0 0 ? Z 07:02 1:30 [kiwix-serve] <defunct>
kelson 5708 0.0 0.0 12752 876 pts/0 S+ 12:19 0:00 grep kiwix-serve
root 6343 0.2 0.0 0 0 ? Z 09:03 0:24 [kiwix-serve] <defunct>
root 7992 0.0 0.0 0 0 ? Z 09:17 0:09 [kiwix-serve] <defunct>
root 8436 1.9 5.7 97090236 3779152 ? Sl 09:19 3:28 kiwix-serve --daemon --port=8000 --library --monitorLibrary --threads=16 --nodatealias /var/www/library.kiwix.org/library.kiwix.org.xml
root 13923 0.0 0.0 0 0 ? Z 04:19 0:06 [kiwix-serve] <defunct>
root 14456 1.1 0.0 0 0 ? Z 04:21 5:21 [kiwix-serve] <defunct>
root 15077 0.0 0.0 0 0 ? Z 08:40 0:04 [kiwix-serve] <defunct>
root 15643 0.2 0.0 0 0 ? Z 08:42 0:28 [kiwix-serve] <defunct>
root 40612 1.4 0.0 0 0 ? Z Mar24 14:41 [kiwix-serve] <defunct>
My analysis of kiwix/maintenance/library-docker/bin/start.sh and kiwix/maintenance/library-docker/bin/restart-kiwix-serve.sh suggests that the zombie processes of kiwix-serve
have nothing to do with its reaction to termination requests.
In library-docker/bin/start.sh
cron
is configured to execute restart-kiwix-serve.sh
every minute without any arguments. That command only checks if kiwix-serve
is alive and, if so, does nothing. Otherwise, it starts a new instance of kiwix-serve
.
Attempts to restart (i.e. terminate the running instance of kiwix-serve
and start a new one) are made only if restart-kiwix-serve.sh
is passed the restart
command line argument. This is configured only for the start-up of the container and is meaningless since on start-up no kiwix-serve
should be expected to be already running.
Therefore zombie/<defunct> processes of kiwix-serve
appear independently from restart-kiwix-serve.sh
(which simply recovers from that situation by launching kiwix-serve
again). Why kiwix-serve
goes <defunct> has to be debugged.
My analysis of kiwix/maintenance/library-docker/bin/start.sh and kiwix/maintenance/library-docker/bin/restart-kiwix-serve.sh suggests that the zombie processes of
kiwix-serve
have nothing to do with its reaction to termination requests.
This is probably wrong, here is something I made live on library.kiwix.org
(unfortunately I can not reproduce localy)
$ docker-compose up --detach --force-recreate library
Recreating library ... done
$ date
Sat Mar 26 10:29:08 CET 2022
$ ps aux | grep kiwix-serve
root 6157 0.0 0.0 12752 948 pts/0 S+ 10:29 0:00 grep kiwix-serve
root 36150 19.8 0.8 3991372 561872 ? Sl 10:28 0:06 kiwix-serve --daemon --port=8000 --library --monitorLibrary --threads=16 --nodatealias /var/www/library.kiwix.org/library.kiwix.org.xml
$ date
Sat Mar 26 10:29:45 CET 2022
$ kill -TERM 36150
$ ps aux | grep kiwix-serve
root 7022 0.0 0.0 12752 964 pts/0 S+ 10:30 0:00 grep kiwix-serve
root 36150 6.8 0.0 0 0 ? Z 10:28 0:07 [kiwix-serve] <defunct>
But your argumentation/analysis is proper: since we use th e --monitorLibrary
, kiwix-serve is not started arbitrarly anymore from this script (beside at container start). That said, the kill -TERM
still generates such <defunc>
processes. Therefore, either their is an other root cause or something else kills the process. At least I believe, using kill -TERM
might be an interesting "artificial" way to reproduce the bug and diagnosis it.
here is something I made live on
library.kiwix.org
(unfortunately I can not reproduce localy)$ docker-compose up --detach --force-recreate library Recreating library ... done $ date Sat Mar 26 10:29:08 CET 2022 $ ps aux | grep kiwix-serve root 6157 0.0 0.0 12752 948 pts/0 S+ 10:29 0:00 grep kiwix-serve root 36150 19.8 0.8 3991372 561872 ? Sl 10:28 0:06 kiwix-serve --daemon --port=8000 --library --monitorLibrary --threads=16 --nodatealias /var/www/library.kiwix.org/library.kiwix.org.xml $ date Sat Mar 26 10:29:45 CET 2022 $ kill -TERM 36150 $ ps aux | grep kiwix-serve root 7022 0.0 0.0 12752 964 pts/0 S+ 10:30 0:00 grep kiwix-serve root 36150 6.8 0.0 0 0 ? Z 10:28 0:07 [kiwix-serve] <defunct>
Do you kill a process belonging to a docker container from outside the container?
@veloman-yunkan This is the case here, yes.
It look like the problem belongs to the parent process of kiwix-serve. See https://linuxreviews.org/Defunct_process
Unix manages an explicit parent-child relationships between processes (Windows does not do this).
When a child process dies, the parent process recieves a notification. It is then the duty of the parent process to explicitly take notice of the childs demise by using the wait() system call.
The return value of the wait() is the process ID of the child, which gives the parent exact control about which of its children are still alive. Upon returning, wait() will have set the integer pointed to by its argument to the exit status of the child. A shell programm like "bash" could then decide how to process following commands and set the special $? variable accordingly.
As long as the parent hasn't called wait(), the system needs to keep the dead child in the global process list, because that's the only place where the process ID is stored. The purpose of the "zombies" is really just for the system to remember the process ID, so that it can inform the parent process about it on request.
If the parent "forgets" to collect on its children, then the zombie will stay undead forever.
Well, almost forever. If the parent itself dies, then "init" (the system process with the ID 0) will take over fostership over its children and catch up on the neglected parental duties. This is why you need to identify the parent process and stop or restart it in order to get rid of defunct processes. If the zombie process has id nnnnn, you can do ps -ef | grep nnnnn and find the id of the parent process, which you can then kill if no longer needed. Then the defunct process will be removed from the list.
@kelson42 Will you please find out the parent process of the defunct kiwix-serve
processes using the ps -ef
command?
$ ps -ef | grep kiwix-serve
root 852 35156 0 10:45 ? 00:00:06 [kiwix-serve] <defunct>
root 966 35156 1 10:47 ? 00:04:26 [kiwix-serve] <defunct>
root 3331 35156 0 Mar26 ? 00:01:53 [kiwix-serve] <defunct>
root 7583 35156 0 Mar26 ? 00:02:24 [kiwix-serve] <defunct>
root 7632 35156 0 Mar26 ? 00:06:35 [kiwix-serve] <defunct>
kelson 21314 21143 0 15:49 pts/2 00:00:00 grep kiwix-serve
root 22859 35156 0 00:23 ? 00:00:06 [kiwix-serve] <defunct>
root 23620 35156 1 00:26 ? 00:12:09 [kiwix-serve] <defunct>
root 35907 35156 0 Mar26 ? 00:03:12 [kiwix-serve] <defunct>
root 36150 35156 0 Mar26 ? 00:00:07 [kiwix-serve] <defunct>
root 39560 35156 0 Mar26 ? 00:01:32 [kiwix-serve] <defunct>
root 43009 35156 0 14:38 ? 00:00:04 [kiwix-serve] <defunct>
root 43192 35156 2 14:40 ? 00:01:47 kiwix-serve --daemon --port=8000 --library --monitorLibrary --threads=16 --nodatealias /var/www/library.kiwix.org/library.kiwix.org.xml
and
$ ps aux | grep 35156
kelson 21660 0.0 0.0 12752 904 pts/2 S+ 15:51 0:00 grep 35156
systemd+ 35156 0.0 0.0 18604 1428 ? SLs Mar26 0:05 /usr/sbin/varnishd -F -a :80 -T localhost:6082 -f /etc/varnish/default.vcl -S /etc/varnish/secret -s malloc,256m
This problem seems śtrongly tight to docker. Looks like somehow the PPID of kiwix-serve is the Varnish process in place of ths bourne shell process to me. Might be a duplicate of #204.
I think that everything is correct. When started in --daemon
mode kiwix-serve
detaches itself from the parent shell process and becomes a child of the init
process (which in this container is varnishd
)
@veloman-yunkan So, what is wrong to your opinion?
I guess the problem is that we don't have a proper init
process in our container (dumbinit
or tini
, see https://stackoverflow.com/questions/37374310/how-critical-is-dumb-init-for-docker). It would be needed even if #204 is implemented.
@veloman-yunkan Thank you for your analysis on this.
@rgaudin @mgautierfr Should we close this ticket in favour of #204 (as duplicate)?
You should use dumbinit (or tini) as said by @veloman-yunkan and how it was made in kiwix/kiwix-tools#489
If the parent itself dies, then "init" (the system process with the ID 0) will take over fostership over its children and catch up on the neglected parental duties
In our case varnish is the PID 0 and it becomes parent of kiwix-serve process (and never wait()
for it as it is not a init process).
#204 is not related to this issue.
Closing this as we're not using this image anymore. New container(s) uses plain kiwix-tools image with kiwix-serve directly (through its dumb-init) and we're not killing it manually anymore.