waku-org/nwaku

nim_waku_p2p_max_connections limit of 150 makes the canary think that the node is offline

Closed this issue · 1 comments

Background

ref -> https://canary.infra.status.im/service/174/

Screenshot 2024-09-05 at 5 19 51 PM

Details

When I checked node-01.do-ams3.waku.sandbox for these alerts.

  • Host was up.
  • p2p port was open.
  • Host had proper disk space.
  • docker logs did not show any unusal errors.

The reason why canary thinks the node is offline was due to

# P2P Connections
nim_waku_p2p_max_connections: 150

cc @jakubgs

Acceptance criteria

We need a way for canary to be able to get node status when a node is busy else we may run into false positives during investigations of alerts.

The max on the fleet is 300:

# Limits
nim_waku_p2p_max_connections: 300

https://github.com/status-im/infra-waku/blob/6e6849b1bd6b05897a73d1f3706c503ebb80951f/ansible/group_vars/node.yml#L34-L35

And indeed at times the nodes do hit 300:

image