jetstack/navigator

Cassandra pods aren't starting because the `cp` process in the init container is being oom-killed

wallrj opened this issue · 0 comments

My cassandra pods aren't starting because the cp process in the init container is being oom-killed.

Perhaps the limits we've chosen aren't sufficient:

		Resources: apiv1.ResourceRequirements{
			Requests: apiv1.ResourceList{
				apiv1.ResourceCPU:    resource.MustParse("10m"),
				apiv1.ResourceMemory: resource.MustParse("8Mi"),
			},
			Limits: apiv1.ResourceList{
				apiv1.ResourceCPU:    resource.MustParse("10m"),
				apiv1.ResourceMemory: resource.MustParse("8Mi"),
			},
		},
[ 1573.878353] cp invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null),  order=0, oom_score_adj=-998                                                                                                                                                           
[ 1573.878357] cp cpuset=ec391272ec599d14b9d3ad22e7b3d37b4e3047eef9ea796e239861e0c209a763 mems_allowed=0                                                                                                                                                                      
[ 1573.878363] CPU: 2 PID: 4289 Comm: cp Not tainted 4.13.0-36-generic #40-Ubuntu                                                                                                                                                                                             
[ 1573.878364] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011                                                                                                                                                                      
[ 1573.878365] Call Trace:                                                                                                                                                                                                                                                    
[ 1573.878373]  dump_stack+0x63/0x8b                                                                                                                                                                                                                                          
[ 1573.878378]  dump_header+0x97/0x225                                                                                                                                                                                                                                        
[ 1573.878383]  ? mem_cgroup_scan_tasks+0xcb/0x100                                                                                                                                                                                                                            
[ 1573.878385]  oom_kill_process+0x20b/0x410                                                                                                                                                                                                                                  
[ 1573.878387]  out_of_memory+0x2b6/0x4d0                                                                                                                                                                                                                                     
[ 1573.878389]  mem_cgroup_out_of_memory+0x4b/0x80                                                                                                                                                                                                                            
[ 1573.878391]  mem_cgroup_oom_synchronize+0x2e8/0x320                                                                                                
[ 1573.878393]  ? mem_cgroup_css_online+0x40/0x40                                                                                                     
[ 1573.878395]  pagefault_out_of_memory+0x36/0x7b                                                                                                     
[ 1573.878398]  mm_fault_error+0x90/0x180                                                                                                             
[ 1573.878400]  __do_page_fault+0x4a1/0x4d0                                                                                                           
[ 1573.878402]  do_page_fault+0x22/0x30                                                                                                               
[ 1573.878405]  ? page_fault+0x36/0x60                                                                                                                
[ 1573.878406]  page_fault+0x4c/0x60                                                                                                                  
[ 1573.878408] RIP: 0033:0x7f15760a41a4                                                                                                               
[ 1573.878409] RSP: 002b:00007ffcdbec1328 EFLAGS: 00010246                                                                                            
[ 1573.878411] RAX: 0000000000766000 RBX: 0000000000000000 RCX: 00007f15760a41a4                                                                      
[ 1573.878411] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000004                                                                      
[ 1573.878412] RBP: 00007ffcdbec1390 R08: 0000000001000000 R09: 0000000000000000                                                                      
[ 1573.878413] R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000000001                                                                      
[ 1573.878414] R13: 0000000001000000 R14: 0000000000000004 R15: 0000000000000000                                                                      
[ 1573.878415] Task in /docker/bb42f318b9f3c53ec58913332bba7670b70cce16a11c57953b021201335144ef/kubepods/pod61ce4770-2d11-11e8-9b9f-02420ac00002/ec391272ec599d14b9d3ad22e7b3d37b4e3047eef9ea796e239861e0c209a763 killed as a result of limit of /docker/bb42f318b9f3c53ec589$
3332bba7670b70cce16a11c57953b021201335144ef/kubepods/pod61ce4770-2d11-11e8-9b9f-02420ac00002/ec391272ec599d14b9d3ad22e7b3d37b4e3047eef9ea796e239861e0c209a763                                    
[ 1573.878422] memory: usage 8192kB, limit 8192kB, failcnt 1457                                                                                       
[ 1573.878423] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0                                                                            
[ 1573.878424] kmem: usage 564kB, limit 9007199254740988kB, failcnt 0                                                                                 
[ 1573.878424] Memory cgroup stats for /docker/bb42f318b9f3c53ec58913332bba7670b70cce16a11c57953b021201335144ef/kubepods/pod61ce4770-2d11-11e8-9b9f-02420ac00002/ec391272ec599d14b9d3ad22e7b3d37b4e3047eef9ea796e239861e0c209a763: cache:7580KB rss:48KB rss_huge:0KB shmem:4K
B mapped_file:0KB dirty:7576KB writeback:0KB inactive_anon:4KB active_anon:48KB inactive_file:3788KB active_file:3788KB unevictable:0KB
[ 1573.878434] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name                
[ 1573.878615] [ 4289]     0  4289      382        1       6       3        0          -998 cp                  
[ 1573.878619] Memory cgroup out of memory: Kill process 4289 (cp) score 0 or sacrifice child                   
[ 1573.887122] Killed process 4289 (cp) total-vm:1528kB, anon-rss:4kB, file-rss:0kB, shmem-rss:0kB              
[ 1575.578263] oom_reaper: reaped process 4289 (cp), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

This is on a Mirantis dind cluster:

richard@pet-instance-1:~/go/src/github.com/jetstack/navigator$ docker ps
CONTAINER ID        IMAGE                                COMMAND                  CREATED             STATUS              PORTS                      NAMES
25644e0dc90d        mirantis/kubeadm-dind-cluster:v1.8   "/sbin/dind_init sys…"   39 minutes ago      Up 39 minutes       127.0.0.1:8080->8080/tcp   kube-master
bb42f318b9f3        mirantis/kubeadm-dind-cluster:v1.8   "/sbin/dind_init sys…"   39 minutes ago      Up 39 minutes       8080/tcp                   kube-node-1
0a8fe055506f        mirantis/kubeadm-dind-cluster:v1.8   "/sbin/dind_init sys…"   39 minutes ago      Up 39 minutes       8080/tcp                   kube-node-2

richard@pet-instance-1:~/go/src/github.com/jetstack/navigator$ kubectl -n test-cassandra-1521641327-20179 get events | grep -i warning
7m          16m          37        cass-test-ringnodes-0.151df4c8f1f1282e   Pod                spec.initContainers{install-pilot}      Warning   BackOff                 kubelet, kube-node-1   Back-off restarting failed container

/kind bug