Increasing CPU and memory consumption over time
blazesal opened this issue · 4 comments
The process was left running for days and the CPU and memory consumption was increasing. From the consumption graphs and logs I can tell that the service got restarted - the process was killed because it ran out of memory.
Service logs (chainweb-data):
Jun 29 22:20:46 localhost systemd[1]: kadena-indexer.service: Main process exited, code=killed, status=9/KILL
Jun 29 22:20:46 localhost systemd[1]: kadena-indexer.service: Failed with result 'signal'.
Jun 29 22:20:46 localhost systemd[1]: kadena-indexer.service: Scheduled restart job, restart counter is at 1.
System logs:
Jun 29 22:17:06 localhost CRON[385345]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Jun 29 22:20:46 localhost kernel: [1337443.329441] postgres invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Jun 29 22:20:46 localhost kernel: [1337443.329445] CPU: 0 PID: 385364 Comm: postgres Not tainted 5.4.0-117-generic #132-Ubuntu
Jun 29 22:20:46 localhost kernel: [1337443.329446] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Jun 29 22:20:46 localhost kernel: [1337443.329447] Call Trace:
Jun 29 22:20:46 localhost kernel: [1337443.329454] dump_stack+0x6d/0x8b
Jun 29 22:20:46 localhost kernel: [1337443.329457] dump_header+0x4f/0x1eb
Jun 29 22:20:46 localhost kernel: [1337443.329459] oom_kill_process.cold+0xb/0x10
Jun 29 22:20:46 localhost kernel: [1337443.329462] out_of_memory+0x1cf/0x4d0
Jun 29 22:20:46 localhost kernel: [1337443.329465] __alloc_pages_slowpath+0xd5e/0xe50
Jun 29 22:20:46 localhost kernel: [1337443.329468] __alloc_pages_nodemask+0x2d0/0x320
Jun 29 22:20:46 localhost kernel: [1337443.329471] alloc_pages_current+0x87/0xe0
Jun 29 22:20:46 localhost kernel: [1337443.329473] __page_cache_alloc+0x72/0x90
Jun 29 22:20:46 localhost kernel: [1337443.329475] pagecache_get_page+0xbf/0x300
Jun 29 22:20:46 localhost kernel: [1337443.329477] filemap_fault+0x6b2/0xa50
Jun 29 22:20:46 localhost kernel: [1337443.329479] ? unlock_page_memcg+0x12/0x20
Jun 29 22:20:46 localhost kernel: [1337443.329481] ? page_add_file_rmap+0xff/0x1a0
Jun 29 22:20:46 localhost kernel: [1337443.329483] ? filemap_map_pages+0x24c/0x380
Jun 29 22:20:46 localhost kernel: [1337443.329485] ext4_filemap_fault+0x32/0x50
Jun 29 22:20:46 localhost kernel: [1337443.329488] __do_fault+0x3c/0x170
Jun 29 22:20:46 localhost kernel: [1337443.329489] do_fault+0x24b/0x640
Jun 29 22:20:46 localhost kernel: [1337443.329491] __handle_mm_fault+0x4c5/0x7a0
Jun 29 22:20:46 localhost kernel: [1337443.329493] handle_mm_fault+0xca/0x200
Jun 29 22:20:46 localhost kernel: [1337443.329496] do_user_addr_fault+0x1f9/0x450
Jun 29 22:20:46 localhost kernel: [1337443.329497] __do_page_fault+0x58/0x90
Jun 29 22:20:46 localhost kernel: [1337443.329500] ? exit_to_usermode_loop+0xea/0x160
Jun 29 22:20:46 localhost kernel: [1337443.329501] do_page_fault+0x2c/0xe0
Jun 29 22:20:46 localhost kernel: [1337443.329504] do_async_page_fault+0x39/0x70
Jun 29 22:20:46 localhost kernel: [1337443.329506] async_page_fault+0x34/0x40
Jun 29 22:20:46 localhost kernel: [1337443.329508] RIP: 0033:0x55681dc4b700
Jun 29 22:20:46 localhost kernel: [1337443.329513] Code: Bad RIP value.
Jun 29 22:20:46 localhost kernel: [1337443.329514] RSP: 002b:00007fff8daba0a8 EFLAGS: 00010246
Jun 29 22:20:46 localhost kernel: [1337443.329515] RAX: 000055681ff04140 RBX: 0000000000000000 RCX: 0000000000000000
Jun 29 22:20:46 localhost kernel: [1337443.329517] RDX: 000055681ff05ab8 RSI: 00007f5d5ab7a778 RDI: 00000000000004e1
Jun 29 22:20:46 localhost kernel: [1337443.329517] RBP: 00007fff8daba0f0 R08: 000055681ff04140 R09: 0000000000000000
Jun 29 22:20:46 localhost kernel: [1337443.329518] R10: 00007f5d63771728 R11: 000055681fe866b0 R12: 000055681ff03850
Jun 29 22:20:46 localhost kernel: [1337443.329519] R13: 00007f5d5ab7a458 R14: 00007f5d5abb5aa8 R15: 0000000000000002
Jun 29 22:20:46 localhost kernel: [1337443.329520] Mem-Info:
Jun 29 22:20:46 localhost kernel: [1337443.329524] active_anon:1718800 inactive_anon:253964 isolated_anon:0
Jun 29 22:20:46 localhost kernel: [1337443.329524] active_file:115 inactive_file:124 isolated_file:0
Jun 29 22:20:46 localhost kernel: [1337443.329524] unevictable:0 dirty:0 writeback:0 unstable:0
Jun 29 22:20:46 localhost kernel: [1337443.329524] slab_reclaimable:9000 slab_unreclaimable:13126
Jun 29 22:20:46 localhost kernel: [1337443.329524] mapped:35890 shmem:36019 pagetables:4851 bounce:0
Jun 29 22:20:46 localhost kernel: [1337443.329524] free:26686 free_pcp:1301 free_cma:0
Jun 29 22:20:46 localhost kernel: [1337443.329527] Node 0 active_anon:6875200kB inactive_anon:1015856kB active_file:460kB inactive_file:496kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:143560kB dirty:0kB writeback:0kB shmem:144076kB shmem_thp: 0kB shme
m_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Jun 29 22:20:46 localhost kernel: [1337443.329528] Node 0 DMA free:15908kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pa
getables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jun 29 22:20:46 localhost kernel: [1337443.329531] lowmem_reserve[]: 0 1887 7853 7853 7853
Jun 29 22:20:46 localhost kernel: [1337443.329533] Node 0 DMA32 free:40064kB min:16208kB low:20260kB high:24312kB active_anon:1669504kB inactive_anon:283864kB active_file:248kB inactive_file:120kB unevictable:0kB writepending:0kB present:2080616kB managed:2015080kB mloc
ked:0kB kernel_stack:64kB pagetables:4244kB bounce:0kB free_pcp:1824kB local_pcp:504kB free_cma:0kB
Jun 29 22:20:46 localhost kernel: [1337443.329537] lowmem_reserve[]: 0 0 5966 5966 5966
Jun 29 22:20:46 localhost kernel: [1337443.329539] Node 0 Normal free:51276kB min:51240kB low:64048kB high:76856kB active_anon:5205696kB inactive_anon:731992kB active_file:1016kB inactive_file:1284kB unevictable:0kB writepending:0kB present:6291456kB managed:6117180kB m
locked:0kB kernel_stack:2560kB pagetables:15160kB bounce:0kB free_pcp:3168kB local_pcp:988kB free_cma:0kB
Jun 29 22:20:46 localhost kernel: [1337443.329542] lowmem_reserve[]: 0 0 0 0 0
Jun 29 22:20:46 localhost kernel: [1337443.329544] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB
Jun 29 22:20:46 localhost kernel: [1337443.329551] Node 0 DMA32: 377*4kB (UME) 319*8kB (UE) 382*16kB (UME) 263*32kB (UME) 145*64kB (UME) 52*128kB (UME) 19*256kB (UME) 1*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 39900kB
Jun 29 22:20:46 localhost kernel: [1337443.329557] Node 0 Normal: 156*4kB (UMH) 634*8kB (UMEH) 1016*16kB (UMEH) 400*32kB (UMEH) 135*64kB (UME) 57*128kB (UMEH) 4*256kB (MH) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 51712kB
Jun 29 22:20:46 localhost kernel: [1337443.329565] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jun 29 22:20:46 localhost kernel: [1337443.329566] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jun 29 22:20:46 localhost kernel: [1337443.329567] 36624 total pagecache pages
Jun 29 22:20:46 localhost kernel: [1337443.329568] 358 pages in swap cache
Jun 29 22:20:46 localhost kernel: [1337443.329569] Swap cache stats: add 215039, delete 214681, find 19419431/19422500
Jun 29 22:20:46 localhost kernel: [1337443.329570] Free swap = 0kB
Jun 29 22:20:46 localhost kernel: [1337443.329570] Total swap = 524284kB
Jun 29 22:20:46 localhost kernel: [1337443.329571] 2097016 pages RAM
Jun 29 22:20:46 localhost kernel: [1337443.329572] 0 pages HighMem/MovableOnly
Jun 29 22:20:46 localhost kernel: [1337443.329572] 59974 pages reserved
Jun 29 22:20:46 localhost kernel: [1337443.329572] 0 pages cma reserved
Jun 29 22:20:46 localhost kernel: [1337443.329573] 0 pages hwpoisoned
Jun 29 22:20:46 localhost kernel: [1337443.329573] Tasks state (memory values in pages):
Jun 29 22:20:46 localhost kernel: [1337443.329574] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Jun 29 22:20:46 localhost kernel: [1337443.329578] [ 401] 0 401 30364 451 245760 0 -250 systemd-journal
Jun 29 22:20:46 localhost kernel: [1337443.329581] [ 433] 0 433 5426 437 65536 0 -1000 systemd-udevd
Jun 29 22:20:46 localhost kernel: [1337443.329583] [ 481] 100 481 4594 228 77824 0 0 systemd-network
Jun 29 22:20:46 localhost kernel: [1337443.329584] [ 484] 0 484 2111 798 53248 0 0 haveged
Jun 29 22:20:46 localhost kernel: [1337443.329586] [ 485] 101 485 5964 1005 90112 0 0 systemd-resolve
Jun 29 22:20:46 localhost kernel: [1337443.329588] [ 486] 102 486 22547 211 77824 0 0 systemd-timesyn
Jun 29 22:20:46 localhost kernel: [1337443.329590] [ 627] 0 627 1703 65 49152 0 0 cron
Jun 29 22:20:46 localhost kernel: [1337443.329591] [ 628] 103 628 1922 225 53248 0 -900 dbus-daemon
Jun 29 22:20:46 localhost kernel: [1337443.329593] [ 636] 0 636 7414 2008 98304 42 0 networkd-dispat
Jun 29 22:20:46 localhost kernel: [1337443.329595] [ 637] 0 637 58178 212 90112 0 0 polkitd
Jun 29 22:20:46 localhost kernel: [1337443.329596] [ 639] 104 639 56087 456 86016 0 0 rsyslogd
Jun 29 22:20:46 localhost kernel: [1337443.329598] [ 647] 0 647 4198 236 69632 0 0 systemd-logind
Jun 29 22:20:46 localhost kernel: [1337443.329600] [ 654] 0 654 948 43 49152 0 0 atd
Jun 29 22:20:46 localhost kernel: [1337443.329601] [ 658] 0 658 3043 231 65536 0 -1000 sshd
Jun 29 22:20:46 localhost kernel: [1337443.329603] [ 683] 0 683 78614 433 114688 0 0 ModemManager
Jun 29 22:20:46 localhost kernel: [1337443.329605] [ 703] 0 703 26978 1886 110592 29 0 unattended-upgr
Jun 29 22:20:46 localhost kernel: [1337443.329607] [ 719] 0 719 1400 30 45056 0 0 agetty
Jun 29 22:20:46 localhost kernel: [1337443.329609] [ 723] 0 723 1457 28 45056 0 0 agetty
Jun 29 22:20:46 localhost kernel: [1337443.329610] [ 15192] 113 15192 54401 2959 163840 310 -900 postgres
Jun 29 22:20:46 localhost kernel: [1337443.329612] [ 15194] 113 15194 54532 35150 417792 320 0 postgres
Jun 29 22:20:46 localhost kernel: [1337443.329614] [ 15195] 113 15195 54434 34835 417792 316 0 postgres
Jun 29 22:20:46 localhost kernel: [1337443.329616] [ 15196] 113 15196 54401 1293 143360 326 0 postgres
Jun 29 22:20:46 localhost kernel: [1337443.329617] [ 15197] 113 15197 54570 1062 217088 362 0 postgres
Jun 29 22:20:46 localhost kernel: [1337443.329619] [ 15198] 113 15198 17582 245 118784 320 0 postgres
Jun 29 22:20:46 localhost kernel: [1337443.329621] [ 15199] 113 15199 54509 385 147456 368 0 postgres
Jun 29 22:20:46 localhost kernel: [1337443.329623] [ 34058] 1000 34058 268822249 1925069 16777216 129916 0 chainweb-data
Jun 29 22:20:46 localhost kernel: [1337443.329625] [ 385311] 113 385311 54844 7442 385024 272 0 postgres
Jun 29 22:20:46 localhost kernel: [1337443.329627] [ 385363] 0 385363 3043 209 65536 0 0 sshd
Jun 29 22:20:46 localhost kernel: [1337443.329629] [ 385364] 113 385364 54633 1020 163840 287 0 postgres
Jun 29 22:20:46 localhost kernel: [1337443.329630] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/kadena-indexer.service,task=chainweb-data,pid=34058,uid=1000
Jun 29 22:20:46 localhost kernel: [1337443.329671] Out of memory: Killed process 34058 (chainweb-data) total-vm:1075288996kB, anon-rss:7700276kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:16384kB oom_score_adj:0
Jun 29 22:20:46 localhost kernel: [1337443.742795] oom_reaper: reaped process 34058 (chainweb-data), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
@blazesal thank you for opening this ticket. I want to let you know that we'll turn our focus to chainweb-data
's performance right after releasing the new features that are about to be merged.
@blazesal Thanks again for opening this ticket. I believe this performance issue should be fixed in the recent 2.2.0 release. I would appreciate if you could upgrade to 2.2.0 and check if it does indeed resolve it for you. Otherwise, I'll close this ticket in a week.