oxidecomputer/propolis

oximeter server registration could be more resilient to failure (and asynchronous)

Opened this issue · 0 comments

#511 is the easy fix to #497: instead of blocking in instance_ensure retrying forever, the registration will fail after a couple of retries.

The better, longer term fix is to make registration of the server endpoint asynchronous such that transient failures to connect to the oximeter consumer do not make it such that that endpoint won't be able to serve metrics for forever. This depends on some work on the oximeter side: oxidecomputer/omicron#3956.