tinkerbell/rufio

Continuous check to ensure BMCs are contactable

chrisdoherty4 opened this issue · 0 comments

After the initial check to ensure a machine is contactable we perform no further re-checks unless an update event is triggered for the Machine. Given we're monitoring an external entity we would benefit from a regular check on the BMC interface to ensure its still contactable (essentially a health check).

Proposal

When reconciling a Machine instance, assuming there is no error during reconciliation, the Machine should be re-queued on a customizable interval. The reconciliation process would deal with checking if the machine is contactable.