Memory issues on `ac-cn-hongkong-c.wakuv2.prod` host
jakubgs opened this issue · 5 comments
On 2021-10-08
starting around 07:20 UTC
the node-01.ac-cn-hongkong-c.wakuv2.prod
host started having memory issues:
There was also a few major CPU usage spikes:
It appears this coincides with a major traffic spike:
Which caused a spike in orphaned sockets:
This did not subside until I restarted the host.
Actually, we don't detect log level for nim-waku
logs and that graph also included websockify logs.
This is more like it:
https://kibana.infra.status.im/goto/f3881123aa4d789da02995909c9e2b10
Seems like most "errors"(though their level is WRN
...) are either on of these twos:
failed to store messages topics="wakustore" tid=1 file=waku_store.nim:456 err="failed to prepare"
failed to store peers topics="wakupeers" tid=1 file=peer_manager.nim:44 err="failed to encode: Failed to encode public key"
But the spike doesn't exist anymore:
Thanks for this, @jakubgs. Went through some logs/graphs and believe our main issue is the poor performance and unbounded memory usage of the store
, as logged in waku-org/nwaku#702.
I imagine that as the store
grows and available memory falls, more and more CPU cycles will be spent on garbage collection, memory swapping, etc. This probably also explains why some users have complained that the prod
fleet is slow.
According to me this is the highest priority stability issue in nim-waku
.