openshift/cluster-nfd-operator

Creating, deleting, then recreating an NFD instance causes the controller manager to crash

Closed this issue · 5 comments

The controller manager crashes when an NFD instance is created, then deleted, then recreated again. Relevant logs are below. Then below these logs is a description that explains how to replicate the bug.

2021-08-12T15:29:22.053Z	ERROR	controller-runtime.manager.controller.nodefeaturediscovery	Reconciler error	{"reconciler group": "nfd.openshift.io", "reconciler kind": "NodeFeatureDiscovery", "name": "nfd-instance", "namespace": "openshift-nfd", "error": "Operation cannot be fulfilled on nodefeaturediscoveries.nfd.openshift.io \"nfd-instance\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/go-logr/zapr.(*zapLogger).Error
	/go/src/github.com/openshift/cluster-nfd-operator/vendor/github.com/go-logr/zapr/zapr.go:132
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/src/github.com/openshift/cluster-nfd-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:267
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/src/github.com/openshift/cluster-nfd-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1
	/go/src/github.com/openshift/cluster-nfd-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:198
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1
	/go/src/github.com/openshift/cluster-nfd-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
	/go/src/github.com/openshift/cluster-nfd-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
	/go/src/github.com/openshift/cluster-nfd-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/go/src/github.com/openshift/cluster-nfd-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext
	/go/src/github.com/openshift/cluster-nfd-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.UntilWithContext
	/go/src/github.com/openshift/cluster-nfd-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:99

To replicate the bug:

  1. Create the NFD instance:
oc create -f config/samples/nfd.openshift.io_v1_nodefeaturediscovery.yaml
  1. Delete the NFD instance:
oc delete -f config/samples/nfd.openshift.io_v1_nodefeaturediscovery.yaml
  1. Recreate the NFD instance:
oc create -f config/samples/nfd.openshift.io_v1_nodefeaturediscovery.yaml

Should, coincidentially (and unintentionally), be fixed with PR #199

Basically, the idea was to add a CreateFunc predicate function to check for the existence of an NFD instance.

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.