metal3-io/ip-address-manager

The webhook should reject IPPool deletion if any IP has been allocated

jessehu opened this issue ยท 23 comments

What steps did you take and what happened:
[A clear and concise description on how to REPRODUCE the bug.]

What did you expect to happen:

Anything else you would like to add:
Looks like it doesn't handle the delete request: https://github.com/metal3-io/ip-address-manager/blob/main/api/v1alpha1/ippool_webhook.go#L137

Environment:

  • Cluster-api version:
  • CAPM3 version:
  • IPAM version:
  • Minikube version:
  • environment (metal3-dev-env or other):
  • Kubernetes version: (use kubectl version):

/kind bug

I am not sure tbh if we want to have logic for ippool webhook deletion.. reconcileDelete() does check for the allocations already

func (r *IPPoolReconciler) reconcileDelete(ctx context.Context,
ipPoolMgr ipam.IPPoolManagerInterface,
) (ctrl.Result, error) {
allocationsNb, err := ipPoolMgr.UpdateAddresses(ctx)
if err != nil {
return checkRequeueError(err, "Failed to delete the old secrets")
}
if allocationsNb == 0 {
// ippool is marked for deletion and ready to be deleted,
// so remove the finalizer.
ipPoolMgr.UnsetFinalizer()
}
return ctrl.Result{}, nil
}
. Also, what is the common way to handle the webhooks, seems CAPI also does handle the webhook the same way https://github.com/kubernetes-sigs/cluster-api/blob/main/api/v1beta1/machine_webhook.go#L88-L90?

/cc @schrej if you have other opinions.

Well, there are only two options (so far) on how to handle deletion: finalizers and webhooks.

Finalizers are the safe way, as they prevent deletion on the apiserver and even work when the webhook isn't deployed.
Webhooks are the user friendly way, as they properly prevent deletion, and the resource doesn't stay in a deleted state for a long time, as is the case with Finalizers (you can't un-delete a resource).
In CAPI this was discussed in kubernetes-sigs/cluster-api#5498 for example.

Either solution, or even a combination of both, is fine. Personally I'd probably prefer the combined approach, as you don't end up with deleted pools that remain in use for a long time that way.

As you can't un-delete a resource in K8s, the webhook approach can prevent IPPool from going into Deleting state when it's in used. In case it's deleted unintentionally.

Right now, the deletion is accepted, but we wait until all addresses are released before actually deleting.
Now #137 would move the responsibility onto the user. I am not sure if that is a good thing. The user won't be able to send a delete on the object and just wait until it is deleted. The user will instead have to retry deleting it as long as addresses are allocated. But then the behavior is a bit more straightforward...

Hi @furkatgofurov7 got your point. When the user got error message that "IPPool is in used so can not be deleted", would he/she release the used IPs in this pool first then delete this pool again?

"IPPool is in used so can not be deleted", would he/she release the used IPs in this pool first then delete this pool again?

Yup, that is how it is now when the delete is issued in the objects

But we are also fine also to have combined handling as @schrej suggested.

Thanks @furkatgofurov7. Could you help proceed #137 ? It's ready for review and CI passed.

/triage accepted

/triage accepted

triage/accepted

@jessehu: This issue is currently awaiting triage.
If Metal3.io contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.
The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

/triage accepted

prow is bit slow for requests but eventually should be able to label this. Otherwise I will do it manually.

/triage accepted

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

/remove-lifecycle stale

What is the status of this @furkatgofurov7 ?

#137 referring as a fix to this, but that seems to have some issues passing the CI?

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

/remove-lifecycle stale

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Rozzii commented

This is something that would be welcomed but looks like progress has stalled
/lifecycle frozen