pires/kubernetes-elasticsearch-cluster

Setting Mem Limits creates Zombi Processes on Kube-Nodes

Opened this issue · 0 comments

Hi there,

I'm running this setup of ES in my Kubernetes cluster.
Currently I got the problem that a memory limit (for example in the Statefulset of the data node) will cause that at some point the oom-killer kills the pod and that causes a zombi process on the Kubernetes node.
I can't remove / kill that zombi and the only way to properly resolve that is restarting the hole machine.

Setup:

  • Kubernetes Version 1.6.4
  • OS: CentOS Linux release 7.3.1611 (Core)
  • Kernel: 3.10.0-514.26.2.el7.x86_64
  • Docker Version 1.12.6
  • ES Version quay.io/pires/docker-elasticsearch-kubernetes:5.5.0
  • 3 ES Master
  • 4 Clients
  • 4 Data Nodes

Limits:

  • Kubernetes Limit
Limits:
      memory:   20Gi
    Requests:
      memory:   10Gi
  • Java Limits:
- name: ES_JAVA_OPTS
   value: -Xms10g -Xmx10g

Error messages:

  • /var/log/messages
Sep 13 02:27:38 node-01 kernel: java invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=921
Sep 13 02:27:38 node-01 kernel: java cpuset=0b8fad4642c76b45da68a7333f66974327f9050101b4320bd0ce5424a61a0508 mems_allowed=0-1
Sep 13 02:27:38 node-01 kernel: CPU: 16 PID: 4060 Comm: java Tainted: P        W  OE  ------------   3.10.0-514.26.2.el7.x86_64 #1
Sep 13 02:27:38 node-01 kernel: Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 07/18/2016
Sep 13 02:27:38 node-01 kernel:  ffff88198c694e70 0000000042d6ad47 ffff8820102abcc0 ffffffff81687133
Sep 13 02:27:38 node-01 kernel:  ffff8820102abd50 ffffffff816820de ffff880bc28eb458 000000000000000e
Sep 13 02:27:38 node-01 kernel:  ffff8803792fb340 ffff881c1621c840 0000000000000003 ffffffff81184856
Sep 13 02:27:38 node-01 kernel: Call Trace:
Sep 13 02:27:38 node-01 kernel:  [<ffffffff81687133>] dump_stack+0x19/0x1b
Sep 13 02:27:38 node-01 kernel:  [<ffffffff816820de>] dump_header+0x8e/0x225
Sep 13 02:27:38 node-01 kernel:  [<ffffffff81184856>] ? find_lock_task_mm+0x56/0xc0
Sep 13 02:27:38 node-01 kernel:  [<ffffffff81184d0e>] oom_kill_process+0x24e/0x3c0
Sep 13 02:27:38 node-01 kernel:  [<ffffffff81093c0e>] ? has_capability_noaudit+0x1e/0x30
Sep 13 02:27:38 node-01 kernel:  [<ffffffff811f38a1>] mem_cgroup_oom_synchronize+0x551/0x580
Sep 13 02:27:38 node-01 kernel:  [<ffffffff811f2cf0>] ? mem_cgroup_charge_common+0xc0/0xc0
Sep 13 02:27:38 node-01 kernel:  [<ffffffff81185594>] pagefault_out_of_memory+0x14/0x90
Sep 13 02:27:38 node-01 kernel:  [<ffffffff8167ff4a>] mm_fault_error+0x68/0x12b
Sep 13 02:27:38 node-01 kernel:  [<ffffffff81692f05>] __do_page_fault+0x395/0x450
Sep 13 02:27:38 node-01 kernel:  [<ffffffff81692ff5>] do_page_fault+0x35/0x90
Sep 13 02:27:38 node-01 kernel:  [<ffffffff8168f208>] page_fault+0x28/0x30
Sep 13 02:27:38 node-01 kernel: Task in /kubepods/burstable/pod8eb08692-9479-11e7-bd07-1c98ec1a3b64/0b8fad4642c76b45da68a7333f66974327f9050101b4320bd0ce5424a61a0508 killed as a result of limit of /kubepods/burstable/pod8eb08692-9479-11e7-bd07-1c98ec1a3b64
Sep 13 02:27:38 node-01 kernel: memory: usage 20971520kB, limit 20971520kB, failcnt 44186939
Sep 13 02:27:38 node-01 kernel: memory+swap: usage 20971520kB, limit 9007199254740988kB, failcnt 0
Sep 13 02:27:38 node-01 kernel: kmem: usage 8731528kB, limit 9007199254740988kB, failcnt 0
Sep 13 02:27:38 node-01 kernel: Memory cgroup stats for /kubepods/burstable/pod8eb08692-9479-11e7-bd07-1c98ec1a3b64: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Sep 13 02:27:38 node-01 kernel: Memory cgroup stats for /kubepods/burstable/pod8eb08692-9479-11e7-bd07-1c98ec1a3b64/1480b1971d4ae27dd97d2c9f99b6da2291dab01b8c95aa076dda225956d56a1e: cache:0KB rss:48KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:48KB inactive_file:0KB active_file:0KB unevictable:0KB
Sep 13 02:27:38 node-01 kernel: Memory cgroup stats for /kubepods/burstable/pod8eb08692-9479-11e7-bd07-1c98ec1a3b64/0b8fad4642c76b45da68a7333f66974327f9050101b4320bd0ce5424a61a0508: cache:16760KB rss:12223184KB rss_huge:11823104KB mapped_file:15740KB swap:0KB inactive_anon:19976KB active_anon:1276772KB inactive_file:316KB active_file:144KB unevictable:10941844KB
Sep 13 02:27:38 node-01 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Sep 13 02:27:38 node-01 kernel: [53861]     0 53861      256        3       4        0          -998 pause
Sep 13 02:27:38 node-01 kernel: [47838]     0 47838      387       70       5        0           921 run.sh
Sep 13 02:27:38 node-01 kernel: [47859]  1000 47859 26928629  3058697    9501        0           921 java
Sep 13 02:27:38 node-01 kernel: Memory cgroup out of memory: Kill process 4147 (java) score 1506 or sacrifice child
Sep 13 02:27:38 node-01 kernel: Killed process 47859 (java) total-vm:107714516kB, anon-rss:12222952kB, file-rss:11816kB, shmem-rss:0kB

Zombi process:

ps awux | grep 'Z' | grep -v grep
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
polkitd  14729  0.0  0.0      0     0 ?        Zsl  Aug02  33:43 [java] <defunct>

cat /proc/14729/status
Name:   java
State:  Z (zombie)
Tgid:   14729
Ngid:   0
Pid:    14729
PPid:   14713
TracerPid:      0
Uid:    998     998     998     998
Gid:    997     997     997     997
FDSize: 0
Groups:
Threads:        2
SigQ:   2/514486
SigPnd: 0000000000000000
ShdPnd: 0000000000004100
SigBlk: 0000000000000000
SigIgn: 0000000000001000
SigCgt: 2000000181004ccf
CapInh: 00000000a80425fb
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 00000000a80425fb
CapAmb: 0000000000000000
Seccomp:        0
Cpus_allowed:   ffffffff,ffffffff
Cpus_allowed_list:      0-63
Mems_allowed:   00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003
Mems_allowed_list:      0-1
voluntary_ctxt_switches:        221
nonvoluntary_ctxt_switches:     9

It seems also that the zombi is causing a high load an the system:

top - 08:25:10 up 61 days, 22:30,  1 user,  load average: 48.30, 48.59, 48.41
Tasks: 2281 total,   1 running, 2279 sleeping,   0 stopped,   1 zombie
%Cpu(s):  0.9 us,  0.2 sy,  0.0 ni, 95.7 id,  3.1 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 13173020+total, 33730376 free, 61923612 used, 36076212 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 62966416 avail Mem

How is that zombi created?
Is that because of this?

When i take a look into the container i see the following:

ps auwx
PID   USER     TIME   COMMAND
    1 root       0:00 {run.sh} /bin/sh /run.sh
   11 elastics   6:51 /usr/lib/jvm/default-jvm/jre/bin/java -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC -XX:+AlwaysPreTouch -se
  501 root       0:00 /bin/bash
  508 root       0:00 ps auwx

My guess is that the signal that the Kubelet is sending (when the oom killer kicks in) is not handled in the container. The run.sh is killed but that doesn't kill the ES process.
Have anyone exp. this before?

Would be something like this https://github.com/Yelp/dumb-init a solution for this?