oximeter server registration could be more resilient to failure (and asynchronous)
Opened this issue · 0 comments
jordanhendricks commented
#511 is the easy fix to #497: instead of blocking in instance_ensure
retrying forever, the registration will fail after a couple of retries.
The better, longer term fix is to make registration of the server endpoint asynchronous such that transient failures to connect to the oximeter consumer do not make it such that that endpoint won't be able to serve metrics for forever. This depends on some work on the oximeter side: oxidecomputer/omicron#3956.