[BUG] Duplicate IPs and empty IPPools
Opened this issue · 3 comments
Describe the bug
We are frequently ending up with pods being assigned the same IP, which results in many dueling ARP entries, and much consternation and gnashing of teeth.
We are using the kubernetes method for keeping track of allocations, not the etcd method.
A symptom of this is that the IPPool CRDs frequently end up empty as well.
Expected behavior
Each pod always has a unique IP address and the IPPool CRDs accurately reflect IP allocation.
To Reproduce
We have not managed to reproduce this behavior in a controlled environment. I've tried many different things to make it happen, and none of them have worked. :-(
Environment:
- Whereabouts version : v0.6.1
- Kubernetes version (use
kubectl version
): 1.26.1 - Network-attachment-definition: Proprietary
- Whereabouts configuration (on the host):
{
"datastore": "kubernetes",
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"
},
"reconciler_cron_expression": "30 4 * * *"
}
- OS (e.g. from /etc/os-release): Talos
- Kernel (e.g.
uname -a
):Linux talos-5i3-rhe 5.15.102-talos #1 SMP Mon Mar 13 18:10:38 UTC 2023 x86_64 Linux
- Others: N/A
Additional info / context
Pod or node churn seem important. We haven't been able to catch the problem 'in the act' yet though, so...
I had the same issue when the node powered down.
In this scenario, the pod is reborn on other nodes, and there will be a reappeared podref in the ippools. If recycling occurs regularly, there is a probability that the newly created pod address will be recycled incorrectly, resulting in repeated address allocation.
The associated code is as follows。
whereabouts/pkg/reconciler/wrappedPod.go
Line 31 in 0d7d2be
The problematic code may be analyzed as follows
- Assume that after the node is powered off, the pod is reborn with the following ippools data
"18278": <--pod powered off by node, leaked address
id: 051a974db72a38d2c226aebbf46f596c62c95e2c527e7dae51d6df52d252bb51
podref: xxx/pod-1
"18279": <--Newly generated pod, newly applied address
id: b2a431a89ae18c3132eb3708306a901ea5999da64f10f0a23f14ede25602c01c
podref: xxx/pod-1
- Decide whether to recycle the process. Since the data returned by
pool.Allocations()
is a list transferred from map, the order of the returned list members is not fixed. It may be [18278, 18279], or it may be [18279, 18278] , this step assumes that the returned value is [18278, 18279]. At this time, isPodAlive returns false, because the address of 18278 is inconsistent with the current real pod address, so the decision is xxx/pod-1 to be recycled
func (rl *ReconcileLooper) findOrphanedIPsPerPool(ipPools []storage.IPPool) error {
for _, pool := range ipPools {
orphanIP := OrphanedIPReservations{
Pool: pool,
}
for _, ipReservation := range pool.Allocations() {//The order of the returned list is not fixed
logging.Debugf("the IP reservation: %s", ipReservation)
if ipReservation.PodRef == "" {
_ = logging.Errorf("pod ref missing for Allocations: %s", ipReservation)
continue
}
if !rl.isPodAlive(ipReservation.PodRef, ipReservation.IP.String()) {
logging.Debugf("pod ref %s is not listed in the live pods list", ipReservation.PodRef)
orphanIP.Allocations = append(orphanIP.Allocations, ipReservation)
}
}
if len(orphanIP.Allocations) > 0 {
rl.orphanedIPs = append(rl.orphanedIPs, orphanIP)
}
}
return nil
}
- When the address recycling is executed,
orphanedIP.Pool.Allocations()
returns [18279, 18278] because the order is not fixed, causing allocate.IterateForDeallocation to recycle the currently used 18279 during recycling.
func (rl ReconcileLooper) ReconcileIPPools(ctx context.Context) ([]net.IP, error) {
matchByPodRef := func(reservations []types.IPReservation, podRef string) int {
foundidx := -1
for idx, v := range reservations {
if v.PodRef == podRef {
returnidx
}
}
return foundidx
}
var err error
var totalCleanedUpIps[]net.IP
for _, orphanedIP := range rl.orphanedIPs {
currentIPReservations := orphanedIP.Pool.Allocations()//The order of the returned list is not fixed
podRefsToDeallocate := findOutPodRefsToDeallocateIPsFrom(orphanedIP)
var deallocatedIP net.IP
for _, podRef := range podRefsToDeallocate {
currentIPReservations, deallocatedIP, err = allocate.IterateForDeallocation(currentIPReservations, podRef, matchByPodRef)//may cause the correct address to be recycled
if err != nil {
return nil, err
}
}
logging.Debugf("Going to update the reserve list to: %+v", currentIPReservations)
if err := orphanedIP.Pool.Update(ctx, currentIPReservations); err != nil {
logging.Errorf("failed to update the reservation list: %v", err)
continue
}
totalCleanedUpIps = append(totalCleanedUpIps, deallocatedIP)
}
return totalCleanedUpIps, nil
}
This looks interesting. But I don't understand enough about what's going on here to have a strong opinion as to whether or not this is the issue. Though, it does suggest some ideas for reproducing the issue.
I would like to also mention that we do use the standard practice of having a DaemonSet that installs the CNI and keeps a reconciler/garbage collector running.
We are using multus
to give the pod several interfaces. The primary interface is using the Flannel CNI, and it works fine. The other two are using a macvlan
CNI with an IP being assigned by whereabouts. And yes, we know that the IP ranges the IPs are being assigned from don't overlap. And we consistently use the same NetworkAttachmentDefinitions, so we don't really have a way to have lots of ranges anyway.
And we are seeing a number of odd log messages, though it's not clear that any particular one is related. One log message we see is this:
│ E1211 15:41:37.094624 24 leaderelection.go:330] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided