opensearch-project/opensearch-k8s-operator

[BUG] SmartScaler can't scale down cluster

Closed this issue · 1 comments

What is the bug?

When trying to scale down nodes in a cluster, smartscaler produces error messages that look like this:

operator-controller-manager {"level":"error","ts":"2024-04-09T15:06:37.623Z","msg":"failed to exclude node clone-opensearch-nodes-5","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"clone-opensearch","namespace":"clone-opensearch"},"namespace":"clone-opensearch","name":"clone-opensearch","reconcileID":"3299605f-04ca-4b28-9943-81ee248c45e5","error":"invalid character 'U' looking for beginning of value","stacktrace":"opensearch.opster.io/pkg/reconcilers.(*ScalerReconciler).excludeNode\n\t/workspace/pkg/reconcilers/scaler.go:228\nopensearch.opster.io/pkg/reconcilers.(*ScalerReconciler).reconcileNodePool\n\t/workspace/pkg/reconcilers/scaler.go:127\nopensearch.opster.io/pkg/reconcilers.(*ScalerReconciler).Reconcile\n\t/workspace/pkg/reconcilers/scaler.go:55\nopensearch.opster.io/controllers.(*OpenSearchClusterReconciler).reconcilePhaseRunning\n\t/workspace/controllers/opensearchController.go:321\nopensearch.opster.io/controllers.(*OpenSearchClusterReconciler).Reconcile\n\t/workspace/controllers/opensearchController.go:142\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226"}

How can one reproduce the bug?

Spin up a new OpenSearch cluster with operator version 2.4.0 (2.6.0 has the same issue) and cluster version 1.3.6

What is the expected behavior?

Being able to scale the cluster up and down without issues

What is your host/environment?

Kubernetes version 1.26.x

Do you have any screenshots?

No

Do you have any additional context?

I've tried this both with more elaborate setups as well as very simple setups, but the error remains the same.

Would love any pointers!

User error, if anyone encounters this:

We had a newline character in our cluster-admin-credentials secret.