cybozu-go/coil

AddressBlocks not auto-removed and not manually removable

tloader11 opened this issue · 3 comments

Describe the bug
Removing addressblocks for a drained node results in an infinite wait time.
We had a node PSU failure which resulted in the sudden (and longer than a few days) downtime. Meanwhile we wanted the NAT service to re-schedule on another node. This failed because the addressblock was never freed by Coil. (as the NAT consists of a /32 public IP, there were no spare addresses).

I expected that when draining a node, all addressblocks assigned to that node would be removed. (Please note: the node in question did not work anymore due to a failed PSU, so the controller evicted all pods with --force and --grace-period=0, because otherwise the draining would hang on the egress NAT deployment.

Environments

  • Version: 20.04
  • OS: Ubuntu

To Reproduce
Steps to reproduce the behavior:

  1. Add nodes to cluster
  2. Create /32 address pool
  3. Asign address pool to namespace
  4. Create egress resource in namespace created in step 3.
  5. Suddenly disconnect the node on which the NAT egress pod was running from the cluster
  6. re-scheduling fails and the addressblock is locked in place.

Expected behavior
addressblock gets cleared and egress pod reschedules on another node.

because the node is down, the finalizer fails.
patching the CRD like so: kubectl patch addressblock/galera-0 -p '{"metadata":{"finalizers":[]}}' --type=merge
and deleting the resource ( kubectl delete addressblock galera-0 ) frees the IP address

Hi @tloader11 Thank you for reporting the issue.

Coil GC will release the address block once the node resource gets removed. The node resource still exists after draining the node, although it becomes unschedulable.

Feel free to reopen this issue if you still have a problem.