Cassandra pods aren't starting because the `cp` process in the init container is being oom-killed
wallrj opened this issue · 0 comments
wallrj commented
My cassandra pods aren't starting because the cp
process in the init container is being oom-killed.
Perhaps the limits we've chosen aren't sufficient:
Resources: apiv1.ResourceRequirements{
Requests: apiv1.ResourceList{
apiv1.ResourceCPU: resource.MustParse("10m"),
apiv1.ResourceMemory: resource.MustParse("8Mi"),
},
Limits: apiv1.ResourceList{
apiv1.ResourceCPU: resource.MustParse("10m"),
apiv1.ResourceMemory: resource.MustParse("8Mi"),
},
},
[ 1573.878353] cp invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=-998
[ 1573.878357] cp cpuset=ec391272ec599d14b9d3ad22e7b3d37b4e3047eef9ea796e239861e0c209a763 mems_allowed=0
[ 1573.878363] CPU: 2 PID: 4289 Comm: cp Not tainted 4.13.0-36-generic #40-Ubuntu
[ 1573.878364] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 1573.878365] Call Trace:
[ 1573.878373] dump_stack+0x63/0x8b
[ 1573.878378] dump_header+0x97/0x225
[ 1573.878383] ? mem_cgroup_scan_tasks+0xcb/0x100
[ 1573.878385] oom_kill_process+0x20b/0x410
[ 1573.878387] out_of_memory+0x2b6/0x4d0
[ 1573.878389] mem_cgroup_out_of_memory+0x4b/0x80
[ 1573.878391] mem_cgroup_oom_synchronize+0x2e8/0x320
[ 1573.878393] ? mem_cgroup_css_online+0x40/0x40
[ 1573.878395] pagefault_out_of_memory+0x36/0x7b
[ 1573.878398] mm_fault_error+0x90/0x180
[ 1573.878400] __do_page_fault+0x4a1/0x4d0
[ 1573.878402] do_page_fault+0x22/0x30
[ 1573.878405] ? page_fault+0x36/0x60
[ 1573.878406] page_fault+0x4c/0x60
[ 1573.878408] RIP: 0033:0x7f15760a41a4
[ 1573.878409] RSP: 002b:00007ffcdbec1328 EFLAGS: 00010246
[ 1573.878411] RAX: 0000000000766000 RBX: 0000000000000000 RCX: 00007f15760a41a4
[ 1573.878411] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000004
[ 1573.878412] RBP: 00007ffcdbec1390 R08: 0000000001000000 R09: 0000000000000000
[ 1573.878413] R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000000001
[ 1573.878414] R13: 0000000001000000 R14: 0000000000000004 R15: 0000000000000000
[ 1573.878415] Task in /docker/bb42f318b9f3c53ec58913332bba7670b70cce16a11c57953b021201335144ef/kubepods/pod61ce4770-2d11-11e8-9b9f-02420ac00002/ec391272ec599d14b9d3ad22e7b3d37b4e3047eef9ea796e239861e0c209a763 killed as a result of limit of /docker/bb42f318b9f3c53ec589$
3332bba7670b70cce16a11c57953b021201335144ef/kubepods/pod61ce4770-2d11-11e8-9b9f-02420ac00002/ec391272ec599d14b9d3ad22e7b3d37b4e3047eef9ea796e239861e0c209a763
[ 1573.878422] memory: usage 8192kB, limit 8192kB, failcnt 1457
[ 1573.878423] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[ 1573.878424] kmem: usage 564kB, limit 9007199254740988kB, failcnt 0
[ 1573.878424] Memory cgroup stats for /docker/bb42f318b9f3c53ec58913332bba7670b70cce16a11c57953b021201335144ef/kubepods/pod61ce4770-2d11-11e8-9b9f-02420ac00002/ec391272ec599d14b9d3ad22e7b3d37b4e3047eef9ea796e239861e0c209a763: cache:7580KB rss:48KB rss_huge:0KB shmem:4K
B mapped_file:0KB dirty:7576KB writeback:0KB inactive_anon:4KB active_anon:48KB inactive_file:3788KB active_file:3788KB unevictable:0KB
[ 1573.878434] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[ 1573.878615] [ 4289] 0 4289 382 1 6 3 0 -998 cp
[ 1573.878619] Memory cgroup out of memory: Kill process 4289 (cp) score 0 or sacrifice child
[ 1573.887122] Killed process 4289 (cp) total-vm:1528kB, anon-rss:4kB, file-rss:0kB, shmem-rss:0kB
[ 1575.578263] oom_reaper: reaped process 4289 (cp), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
This is on a Mirantis dind cluster:
richard@pet-instance-1:~/go/src/github.com/jetstack/navigator$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
25644e0dc90d mirantis/kubeadm-dind-cluster:v1.8 "/sbin/dind_init sys…" 39 minutes ago Up 39 minutes 127.0.0.1:8080->8080/tcp kube-master
bb42f318b9f3 mirantis/kubeadm-dind-cluster:v1.8 "/sbin/dind_init sys…" 39 minutes ago Up 39 minutes 8080/tcp kube-node-1
0a8fe055506f mirantis/kubeadm-dind-cluster:v1.8 "/sbin/dind_init sys…" 39 minutes ago Up 39 minutes 8080/tcp kube-node-2
richard@pet-instance-1:~/go/src/github.com/jetstack/navigator$ kubectl -n test-cassandra-1521641327-20179 get events | grep -i warning
7m 16m 37 cass-test-ringnodes-0.151df4c8f1f1282e Pod spec.initContainers{install-pilot} Warning BackOff kubelet, kube-node-1 Back-off restarting failed container
/kind bug