Setting Mem Limits creates Zombi Processes on Kube-Nodes
Opened this issue · 0 comments
Hi there,
I'm running this setup of ES in my Kubernetes cluster.
Currently I got the problem that a memory limit (for example in the Statefulset of the data node) will cause that at some point the oom-killer kills the pod and that causes a zombi process on the Kubernetes node.
I can't remove / kill that zombi and the only way to properly resolve that is restarting the hole machine.
Setup:
- Kubernetes Version 1.6.4
- OS: CentOS Linux release 7.3.1611 (Core)
- Kernel: 3.10.0-514.26.2.el7.x86_64
- Docker Version 1.12.6
- ES Version quay.io/pires/docker-elasticsearch-kubernetes:5.5.0
- 3 ES Master
- 4 Clients
- 4 Data Nodes
Limits:
- Kubernetes Limit
Limits:
memory: 20Gi
Requests:
memory: 10Gi
- Java Limits:
- name: ES_JAVA_OPTS
value: -Xms10g -Xmx10g
Error messages:
- /var/log/messages
Sep 13 02:27:38 node-01 kernel: java invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=921
Sep 13 02:27:38 node-01 kernel: java cpuset=0b8fad4642c76b45da68a7333f66974327f9050101b4320bd0ce5424a61a0508 mems_allowed=0-1
Sep 13 02:27:38 node-01 kernel: CPU: 16 PID: 4060 Comm: java Tainted: P W OE ------------ 3.10.0-514.26.2.el7.x86_64 #1
Sep 13 02:27:38 node-01 kernel: Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 07/18/2016
Sep 13 02:27:38 node-01 kernel: ffff88198c694e70 0000000042d6ad47 ffff8820102abcc0 ffffffff81687133
Sep 13 02:27:38 node-01 kernel: ffff8820102abd50 ffffffff816820de ffff880bc28eb458 000000000000000e
Sep 13 02:27:38 node-01 kernel: ffff8803792fb340 ffff881c1621c840 0000000000000003 ffffffff81184856
Sep 13 02:27:38 node-01 kernel: Call Trace:
Sep 13 02:27:38 node-01 kernel: [<ffffffff81687133>] dump_stack+0x19/0x1b
Sep 13 02:27:38 node-01 kernel: [<ffffffff816820de>] dump_header+0x8e/0x225
Sep 13 02:27:38 node-01 kernel: [<ffffffff81184856>] ? find_lock_task_mm+0x56/0xc0
Sep 13 02:27:38 node-01 kernel: [<ffffffff81184d0e>] oom_kill_process+0x24e/0x3c0
Sep 13 02:27:38 node-01 kernel: [<ffffffff81093c0e>] ? has_capability_noaudit+0x1e/0x30
Sep 13 02:27:38 node-01 kernel: [<ffffffff811f38a1>] mem_cgroup_oom_synchronize+0x551/0x580
Sep 13 02:27:38 node-01 kernel: [<ffffffff811f2cf0>] ? mem_cgroup_charge_common+0xc0/0xc0
Sep 13 02:27:38 node-01 kernel: [<ffffffff81185594>] pagefault_out_of_memory+0x14/0x90
Sep 13 02:27:38 node-01 kernel: [<ffffffff8167ff4a>] mm_fault_error+0x68/0x12b
Sep 13 02:27:38 node-01 kernel: [<ffffffff81692f05>] __do_page_fault+0x395/0x450
Sep 13 02:27:38 node-01 kernel: [<ffffffff81692ff5>] do_page_fault+0x35/0x90
Sep 13 02:27:38 node-01 kernel: [<ffffffff8168f208>] page_fault+0x28/0x30
Sep 13 02:27:38 node-01 kernel: Task in /kubepods/burstable/pod8eb08692-9479-11e7-bd07-1c98ec1a3b64/0b8fad4642c76b45da68a7333f66974327f9050101b4320bd0ce5424a61a0508 killed as a result of limit of /kubepods/burstable/pod8eb08692-9479-11e7-bd07-1c98ec1a3b64
Sep 13 02:27:38 node-01 kernel: memory: usage 20971520kB, limit 20971520kB, failcnt 44186939
Sep 13 02:27:38 node-01 kernel: memory+swap: usage 20971520kB, limit 9007199254740988kB, failcnt 0
Sep 13 02:27:38 node-01 kernel: kmem: usage 8731528kB, limit 9007199254740988kB, failcnt 0
Sep 13 02:27:38 node-01 kernel: Memory cgroup stats for /kubepods/burstable/pod8eb08692-9479-11e7-bd07-1c98ec1a3b64: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Sep 13 02:27:38 node-01 kernel: Memory cgroup stats for /kubepods/burstable/pod8eb08692-9479-11e7-bd07-1c98ec1a3b64/1480b1971d4ae27dd97d2c9f99b6da2291dab01b8c95aa076dda225956d56a1e: cache:0KB rss:48KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:48KB inactive_file:0KB active_file:0KB unevictable:0KB
Sep 13 02:27:38 node-01 kernel: Memory cgroup stats for /kubepods/burstable/pod8eb08692-9479-11e7-bd07-1c98ec1a3b64/0b8fad4642c76b45da68a7333f66974327f9050101b4320bd0ce5424a61a0508: cache:16760KB rss:12223184KB rss_huge:11823104KB mapped_file:15740KB swap:0KB inactive_anon:19976KB active_anon:1276772KB inactive_file:316KB active_file:144KB unevictable:10941844KB
Sep 13 02:27:38 node-01 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Sep 13 02:27:38 node-01 kernel: [53861] 0 53861 256 3 4 0 -998 pause
Sep 13 02:27:38 node-01 kernel: [47838] 0 47838 387 70 5 0 921 run.sh
Sep 13 02:27:38 node-01 kernel: [47859] 1000 47859 26928629 3058697 9501 0 921 java
Sep 13 02:27:38 node-01 kernel: Memory cgroup out of memory: Kill process 4147 (java) score 1506 or sacrifice child
Sep 13 02:27:38 node-01 kernel: Killed process 47859 (java) total-vm:107714516kB, anon-rss:12222952kB, file-rss:11816kB, shmem-rss:0kB
Zombi process:
ps awux | grep 'Z' | grep -v grep
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
polkitd 14729 0.0 0.0 0 0 ? Zsl Aug02 33:43 [java] <defunct>
cat /proc/14729/status
Name: java
State: Z (zombie)
Tgid: 14729
Ngid: 0
Pid: 14729
PPid: 14713
TracerPid: 0
Uid: 998 998 998 998
Gid: 997 997 997 997
FDSize: 0
Groups:
Threads: 2
SigQ: 2/514486
SigPnd: 0000000000000000
ShdPnd: 0000000000004100
SigBlk: 0000000000000000
SigIgn: 0000000000001000
SigCgt: 2000000181004ccf
CapInh: 00000000a80425fb
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 00000000a80425fb
CapAmb: 0000000000000000
Seccomp: 0
Cpus_allowed: ffffffff,ffffffff
Cpus_allowed_list: 0-63
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003
Mems_allowed_list: 0-1
voluntary_ctxt_switches: 221
nonvoluntary_ctxt_switches: 9
It seems also that the zombi is causing a high load an the system:
top - 08:25:10 up 61 days, 22:30, 1 user, load average: 48.30, 48.59, 48.41
Tasks: 2281 total, 1 running, 2279 sleeping, 0 stopped, 1 zombie
%Cpu(s): 0.9 us, 0.2 sy, 0.0 ni, 95.7 id, 3.1 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 13173020+total, 33730376 free, 61923612 used, 36076212 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 62966416 avail Mem
How is that zombi created?
Is that because of this?
When i take a look into the container i see the following:
ps auwx
PID USER TIME COMMAND
1 root 0:00 {run.sh} /bin/sh /run.sh
11 elastics 6:51 /usr/lib/jvm/default-jvm/jre/bin/java -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC -XX:+AlwaysPreTouch -se
501 root 0:00 /bin/bash
508 root 0:00 ps auwx
My guess is that the signal that the Kubelet is sending (when the oom killer kicks in) is not handled in the container. The run.sh is killed but that doesn't kill the ES process.
Have anyone exp. this before?
Would be something like this https://github.com/Yelp/dumb-init a solution for this?