kubernetes-sigs/cluster-api-provider-digitalocean

DOMachine with missing instance ID is never removed on cluster delete

timoreimann opened this issue · 2 comments

When a Cluster is deleted while a DOMachine does not have have an instance ID attached (which can happen when the create failed for some reason or the control plane never became ready), the delete hangs forever because DOMachineReconciler terminates reconciliation prematurely. The only way to address the matter is to remove the finalizer manually.

I'm not 100% sure how to best solve the issue. One approach I can think of is to remove the DOMachine finalizer if the instance ID is missing.

prksu commented

@timoreimann yeah, i also realized this issue when we discuss about terminal failure.

One approach I can think of is to remove the DOMachine finalizer if the instance ID is missing.

that's true. the solution is to remove the finalizer in this block as well

if droplet == nil {
clusterScope.V(2).Info("Unable to locate droplet instance")
r.Recorder.Eventf(domachine, corev1.EventTypeWarning, "NoInstanceFound", "Skip deleting")
return reconcile.Result{}, nil
}

fyi, capa did something like this
https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/d269842417540c49cffa82a754187d73ffa21e86/controllers/awsmachine_controller.go#L319-L331

@prksu sweet, I'll assign the ticket to myself and drive the change.

/assign timoreimann