hawtio/hawtio-operator

Hawtio Operator is producing 'already exist' errors for configmap in log

phantomjinx opened this issue · 4 comments

Errors being produced:

{"level":"error","ts":1701096292.863362,"logger":"controller_hawtio","msg":"Error reconciling ConfigMap","Request.Namespace":"hawtio-dev","Request.Name":"hawtio-online","error":"error AddResources: configmaps \"hawtio-online\" already exists","stacktrace":"github.com/hawtio/hawtio-operator/pkg/controller/hawtio.(*ReconcileHawtio).Reconcile\n\tgithub.com/hawtio/hawtio-operator/pkg/controller/hawtio/hawtio_controller.go:411\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:235\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\tsigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:188\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\tk8s.io/apimachinery@v0.18.6/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\tk8s.io/apimachinery@v0.18.6/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tk8s.io/apimachinery@v0.18.6/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\tk8s.io/apimachinery@v0.18.6/pkg/util/wait/wait.go:90"}
{"level":"error","ts":1701096292.863478,"logger":"controller","msg":"Reconciler error","controller":"hawtio-controller","name":"hawtio-online","namespace":"hawtio-dev","error":"error AddResources: configmaps \"hawtio-online\" already exists","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:237\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\tsigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:188\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\tk8s.io/apimachinery@v0.18.6/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\tk8s.io/apimachinery@v0.18.6/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tk8s.io/apimachinery@v0.18.6/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\tk8s.io/apimachinery@v0.18.6/pkg/util/wait/wait.go:90"}

This is not an error with the operator but a current limitation in that the operator is not able to handle the existence of same-named resources that are not owned by the operator, eg. if install of hawtio-online has been made manually without the operator.

Therefore, this should be triaged as a required feature improvement.

I'm not sure if it's something we need to fix. If I understand it correctly, it happens when we manually install hawtio-online first then we want to switch its management to operator by installing hawtio-operator, right? Isn't it then dangerous that the user doesn't want to do so but accidentaly pass its management to operator by installing operator carelessly? The user should know what they are doing. If they want to reinstall hawtio-online by operator, first they should uninstall it manually and then install operator.

I would agree but for the resources that get left behind, eg. configmap or route. Unless the user uses the make uninstall with the correct mode and clustertype those resources can get left behind. The operator is then installed but does not display a problem until the errors are logged in its own log - something that can be easily missed.

So, the fix might be to improve the logging of the errors, eg. maybe the operator first scans for any resources that are relevant but not owner by it then reports this as a status failure in the CR. Even if we don't do it this way, the CR should still report a status failure rather than stay stuck at initialized.

I agree, logging with a better message is the right solution for the issue.