redhat-cop/namespace-configuration-operator

some namespaces fail to be reconsiled consistently

bergerx opened this issue · 5 comments

We have a certain template being applied to all but one specific namespace on multiple clusters with same configuration.

I couldn't yet get a way to reliably replicate the issue, or went through the code to debug, but here are two log lines keep repeating during reconciles, these may be related (i changed the log line format little bit to make them easy to read):

{
  "level": "error",
  "ts": 1595859877.1848087,
  "logger": "controller_patchlocker",
  "msg": "unable to update status for",
  "object": {
    "apiVersion": "redhatcop.redhat.io/v1alpha1",
    "kind": "NamespaceConfig",
    "name": "networkpolicy-allow-on-system-namespaces"
  },
  "error": "Operation cannot be fulfilled on namespaceconfigs.redhatcop.redhat.io \"networkpolicy-allow-on-system-namespaces\": the object has been modified; please apply your changes to the latest version and try again",
  "stacktrace": "github.com/go-logr/zapr.(*zapLogger).Error
	/home/travis/gopath/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128
github.com/redhat-cop/operator-utils/pkg/util/lockedresourcecontroller.(*EnforcingReconciler).ManageSuccess
	/home/travis/gopath/pkg/mod/github.com/redhat-cop/operator-utils@v0.3.3/pkg/util/lockedresourcecontroller/enforcing-reconciler.go:170
github.com/redhat-cop/namespace-configuration-operator/pkg/controller/namespaceconfig.(*ReconcileNamespaceConfig).Reconcile
	/home/travis/gopath/src/github.com/redhat-cop/namespace-configuration-operator/pkg/controller/namespaceconfig/namespaceconfig_controller.go:195
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/home/travis/gopath/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:256
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/home/travis/gopath/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
	/home/travis/gopath/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
	/home/travis/gopath/pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
	/home/travis/gopath/pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/home/travis/gopath/pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.Until
	/home/travis/gopath/pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:90"
}
{
  "level": "error",
  "ts": 1595859877.1848984,
  "logger": "controller-runtime.controller",
  "msg": "Reconciler error",
  "controller": "namespace-config-operator",
  "request": "/networkpolicy-allow-on-system-namespaces",
  "error": "Operation cannot be fulfilled on namespaceconfigs.redhatcop.redhat.io \"networkpolicy-allow-on-system-namespaces\": the object has been modified; please apply your changes to the latest version and try again",
  "stacktrace": "github.com/go-logr/zapr.(*zapLogger).Error
	/home/travis/gopath/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/home/travis/gopath/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:258
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/home/travis/gopath/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
	/home/travis/gopath/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
	/home/travis/gopath/pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
	/home/travis/gopath/pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/home/travis/gopath/pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.Until
	/home/travis/gopath/pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:90"
}

Initially we were on an older version and just upgraded the operator to the recent v0.2.1 version with no help for the issue, the log above is the we are getting in v0.2.1.

the update status error is most often innocuous. There is a clear reason for it and I did not found away to eliminate it. You should see that eventually the system converges to a stable and correct state. If that is not happening for your situation, let's dig deeper.
I'll need to see more logs prior and after the error messages.

Nope, this issue was sticking forever, cant get around eventually and keeps the provisioner in the reconcile loop indefinitely causing it to create huge logs i guess since doing a kubectl logs --tail 10 -f <pod-id> was causing non-stop log flow.

There were no particular logs around these, just a few regular reconcile logs, i don't think we have any instance around anymore, but i'll try to get if we still have an instance.

This issue was happening on one NamespaceConfig and one particular matching namespace (it was kube-system). We developed a workaround and solved the issue by removing the problematic NamespaceConfig. I tried to replicate the issue but strangely was not successful. Will try to gather more info, or will close this issue.

ok, so let me know if we can actually troubleshoot this one. If not we should close it.
recent releases of this operator have improved data validation, preventing the operator from entering some of the loops you describe. However this is not enough o say that the issue has been solved.

@bergerx are you still experiencing the problem? May I close this issue?