[bug] kubelect 错误的对 system.slice 进行了资源限制,导致 OOM
SincereXIA opened this issue · 1 comments
SincereXIA commented
系统环境
- 操作系统:Ubuntu 20.04 5.4.0-91-generic
- 安装命令:
./kainstall-ubuntu.sh init \
--master 192.168.7.140,192.168.7.141,192.168.7.142 \
--worker 192.168.7.143,192.168.7.144,192.168.7.145,192.168.7.146,192.168.7.147,192.168.7.148,192.168.7.149\
--port 22 \
--network calico \
--version 1.21.8 \
问题复现
k8s 部署完成后,部分节点 haproxy 以及大量系统进程被 Kill 掉
[152588.479431] Memory cgroup out of memory: Killed process 897 (haproxy) total-vm:71548kB, anon-rss:41156kB, file-rss:7324kB, shmem-rss:0kB, UID:0 pgtables:156kB oom_score_adj:0
原因是 system.slice
可用内存被限制到了 512M
● system.slice - System Slice
Loaded: loaded
Drop-In: /run/systemd/system.control/system.slice.d
└─50-MemoryLimit.conf, 50-CPUShares.conf
Active: active since Mon 2022-01-03 17:12:02 CST; 5 days ago
Docs: man:systemd.special(7)
Tasks: 476
Memory: 503.4M (limit: 512.0M)
CGroup: /system.slice
├─accounts-daemon.service
│ └─659 /usr/lib/accountsservice/accounts-daemon
├─atd.service
│ └─696 /usr/sbin/atd -f
├─auditd.service
│ └─4896 /sbin/auditd
问题定位
kubelect 配置文件有误
# 节点资源预留
kubeReserved:
cpu: 200m\$(if [[ \$(cat /proc/meminfo | awk '/MemTotal/ {print \$2}') -gt 3670016 ]]; then echo -e '\n memory: 256Mi';fi)
ephemeral-storage: 1Gi
systemReserved:
cpu: 300m\$(if [[ \$(cat /proc/meminfo | awk '/MemTotal/ {print \$2}') -gt 3670016 ]]; then echo -e '\n memory: 512Mi';fi)
ephemeral-storage: 1Gi
kubeReservedCgroup: /kube.slice
systemReservedCgroup: /system.slice
enforceNodeAllocatable:
- pods
- kube-reserved
- system-reserved
由于此处 enforceNodeAllocatable
加入了 system-reserved
和 kube-reserved
,导致本应预留的资源被设定成了 system.slice
Cgroup 的资源上限
参考
该问题在 Ubunt 20.4 上必复现,调整配置文件,删除 system-reserved
和 kube-reserved
之后问题解决。不确定其他系统是否存在该问题
lework commented
是的,这个限制我故意开启的(用于测试),既然影响了,就取消吧。下个提交我会把这个都取消了。