lework/kainstall

[bug] kubelect 错误的对 system.slice 进行了资源限制,导致 OOM

SincereXIA opened this issue · 1 comments

系统环境

  • 操作系统:Ubuntu 20.04 5.4.0-91-generic
  • 安装命令:
./kainstall-ubuntu.sh init \
  --master 192.168.7.140,192.168.7.141,192.168.7.142 \
  --worker 192.168.7.143,192.168.7.144,192.168.7.145,192.168.7.146,192.168.7.147,192.168.7.148,192.168.7.149\
  --port 22 \
  --network calico \
  --version 1.21.8 \

问题复现

k8s 部署完成后,部分节点 haproxy 以及大量系统进程被 Kill 掉

[152588.479431] Memory cgroup out of memory: Killed process 897 (haproxy) total-vm:71548kB, anon-rss:41156kB, file-rss:7324kB, shmem-rss:0kB, UID:0 pgtables:156kB oom_score_adj:0

原因是 system.slice 可用内存被限制到了 512M

● system.slice - System Slice
     Loaded: loaded
    Drop-In: /run/systemd/system.control/system.slice.d
             └─50-MemoryLimit.conf, 50-CPUShares.conf
     Active: active since Mon 2022-01-03 17:12:02 CST; 5 days ago
       Docs: man:systemd.special(7)
      Tasks: 476
     Memory: 503.4M (limit: 512.0M)
     CGroup: /system.slice
             ├─accounts-daemon.service
             │ └─659 /usr/lib/accountsservice/accounts-daemon
             ├─atd.service
             │ └─696 /usr/sbin/atd -f
             ├─auditd.service
             │ └─4896 /sbin/auditd

问题定位

kubelect 配置文件有误

# 节点资源预留
kubeReserved:
  cpu: 200m\$(if [[ \$(cat /proc/meminfo | awk '/MemTotal/ {print \$2}') -gt 3670016 ]]; then echo -e '\n  memory: 256Mi';fi)
  ephemeral-storage: 1Gi
systemReserved:
  cpu: 300m\$(if [[ \$(cat /proc/meminfo | awk '/MemTotal/ {print \$2}') -gt 3670016 ]]; then echo -e '\n  memory: 512Mi';fi)
  ephemeral-storage: 1Gi
kubeReservedCgroup: /kube.slice
systemReservedCgroup: /system.slice
enforceNodeAllocatable: 
- pods
- kube-reserved
- system-reserved

由于此处 enforceNodeAllocatable 加入了 system-reservedkube-reserved ,导致本应预留的资源被设定成了 system.slice Cgroup 的资源上限

参考

该问题在 Ubunt 20.4 上必复现,调整配置文件,删除 system-reservedkube-reserved 之后问题解决。不确定其他系统是否存在该问题

是的,这个限制我故意开启的(用于测试),既然影响了,就取消吧。下个提交我会把这个都取消了。