threefoldfoundation/node-status-bot

RMB based ping doesn't always work

Opened this issue · 2 comments

Sometimes the /ping command returns immediately saying the node did not respond in time, which is impossible since the timeout hasn't elapsed.

This is due to the fact that we use rather basic reply handling (all come in on one queue), and it's possible to get replies even after their expiration time has passed. So replies for old pings sitting in the queue cause false negatives. Need better handling to accurately match outbound messages and replies.

Since it looks like we'll be moving to a Mycelium based RMB, it doesn't make sense to work more on the old implementation.

With a queryable Mycelium IP for each node, we'll be able to move back to a standard ping based approach as was originally implemented over Yggdrasil.