kanisterio/kanister

[BUG] pods with multiple ownerReferences are never considered healthy

Closed this issue · 1 comments

Describe the bug
Some operators add custom ownerReferences to their objects/pods (in this case, StackGres. Kanister currently does not handle anything except a single owner reference.

To Reproduce
Steps to reproduce the behavior:

  1. Create any arbitrary functional statefulset
  2. Edit the resulting pod(s) ownerReferences and add an entry
  3. Create an ActionSet that targets the statefulset
  4. Notice it never sees the statefulset as healthy

Expected behavior
Kanister should look at any and all ownerReferences of a pod and find a match.

Screenshots
In Kasten we get the following error:

cause: '{"cause":{"cause":{"cause":{"message":"Specified 3 replicas and only 0
  are running: could not get StatefulSet{Namespace:
  my-namespace, Name: my-statefulset}: client rate
  limiter Wait returned an error: rate: Wait(n=1) would exceed context
  deadline"},"fields":[{"name":"namespace","value":"my-namespace"},{"name":"name","value":"my-statefulset"}],"file":"kasten.io/k10/kio/exec/phases/phase/snapshot.go:426","function":"kasten.io/k10/kio/exec/phases/phase.WaitOnWorkloadReady","linenumber":426,"message":"Statefulset
  not in ready state. Retry the operation once Statefulset is
  ready"},"fields":[{"name":"workloadName","value":"my-statefulset"},{"name":"workloadNamespace","value":"my-namespace"}],"file":"kasten.io/k10/kio/exec/phases/backup/snapshot_data_phase.go:1128","function":"kasten.io/k10/kio/exec/phases/backup.WaitForWorkloadWithSkipWait","linenumber":1128,"message":"Error
  while waiting for workload to be ready"},"fields":[],"message":"Ignoring error
  waiting on workload to become ready"}'

Environment
Kubernetes Version/Provider: OpenShift 4.14
Storage Provider: MinIO
Cluster Size (#nodes): 12
Data Size: any

Additional context
We are a customer of Veeam Kasten and are experiencing this issue.

Relevant code:

if len(rc.OwnerReferences) != 1 {
continue
}
if rc.OwnerReferences[0].UID != uid {
continue
}

// We ignore ReplicaSets without a single owner.
if len(rs.OwnerReferences) != 1 {
continue
}
// We ignore ReplicaSets owned by other deployments.
if rs.OwnerReferences[0].UID != uid {
continue
}

if len(pod.OwnerReferences) != 1 ||
pod.OwnerReferences[0].UID != uid {
continue
}

Thanks for opening this issue 👍. The team will review it shortly.

If this is a bug report, make sure to include clear instructions how on to reproduce the problem with minimal reproducible examples, where possible. If this is a security report, please review our security policy as outlined in SECURITY.md.

If you haven't already, please take a moment to review our project's Code of Conduct document.