Issue when no active multi providers
Closed this issue · 4 comments
When there are no active providers in a multi client, then doCall
returns nil,nil
which causes unexpected behaviour.
We at Obol encountered this after adding support for multiple beacon nodes using the multi
package. Most of our tests and setups still only has a single beacon node. We configure our eth2 clients with a 2s timeout. This results in sporadic timeouts from beacon nodes in the wild (most calls are super fast, but sometimes around epoch change we suspect, calls sporadically take longer and timeout). This results in the only client being disabled.
Since these timeouts are sporadic, and subsequent calls will succeed, but we have only a single client, we are suggesting to allow "falling back" to inactive providers if no active providers are available. This will result in slower failures when all providers are actually down, but it will recover seamlessly if one of the "inactive providers" is actually "active", like in our case. This is similar to AWS load balancers that "fail open" when all targets are unhealthy.
See PR #21
Another option would be to not add Fallback
support. We would then revert to http
directly instead of multi
if we only have one beacon node configured. But the same problem could still apply to multiple nodes if all timeout sporadically in the same 30s window.
Thank you for the report of this issue. Rather than introduce another flag, I have created #22 which attempts to reactivate clients if there are no clients active at the time of a call.
Great, that also works! Closing this then.