kubernetes-sigs/cluster-api-provider-digitalocean

Reconcile transient droplet create errors automatically

timoreimann opened this issue · 1 comments

As of today, failure to create droplets causes reconciles to terminate prematurely by setting a failure reason and message. This means that transient errors to invoke the DO API cannot be remediated other than by deleting and recreating the DOMachine resource, which is very disruptive.

The suggestion is to not mark create droplet request errors as permanent failures so that backing off reconciles can try to overcome transient errors.

Related Slack discussion can be found here.

I'll take a stab at addressing this myself.

/assign timoreimann