[BUG] Opensearch data node get's constantly excluded from shard allocations

Question

[BUG] Opensearch data node get's constantly excluded from shard allocations

hollowdew opened this issue 5 months ago · 3 comments

hollowdew commented 5 months ago

What is the bug?

Opensearch data node get's constantly excluded from shard allocations

How can one reproduce the bug?

Deploy the opensearch operator and cluster with the configuration shown below.

What is the expected behavior?

Working opensearch cluster without yellow health state / no excluded data nodes.

What is your host/environment?

Debian 12 x64 / k3s v1.28.8+k3s1

Do you have any screenshots?

Outputs listed below.

Do you have any additional context?

I have more than enough disk space available and writing only 4-6 GiB per Day into opensearch.

Hello! I'm using the Helm OpenSearch-Operator and Helm OpenSearch-Cluster.
My cluster is showing a yellow status due to unassigned replica shards.
I ran GET _cluster/settings and received the following output:

{
  "persistent" : {
    "plugins" : {
      "index_state_management" : {
        "template_migration" : {
          "control" : "-1"
        }
      }
    }
  },
  "transient" : {
    "cluster" : {
      "routing" : {
        "allocation" : {
          "enable" : "all",
          "exclude" : {
            "_name" : "opensearch-cluster-data-nodes-0"
          }
        }
      }
    }
  }
}

And the health of my cluster: GET _cluster/health?pretty

{
  "cluster_name" : "opensearch-cluster",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 6,
  "number_of_data_nodes" : 3,
  "discovered_master" : true,
  "discovered_cluster_manager" : true,
  "active_primary_shards" : 171,
  "active_shards" : 335,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 108,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 75.62076749435666
}

I attempted to remove data-nodes-0 from the exclusion list using a PUT request, but it automatically gets added back to the list after a few seconds.
Here are some details about my setup:

helm list
NAME                    NAMESPACE       REVISION        UPDATED                                         STATUS          CHART                           APP VERSION
opensearch-cluster      opensearch      5               2024-04-22 13:53:25.886739305 +0200 CEST        deployed        opensearch-cluster-2.5.1        2.5.1
opensearch-operator     opensearch      4               2024-04-22 14:13:59.756698617 +0200 CEST        deployed        opensearch-operator-2.5.1       2.5.1

kubectl get OpenSearchCluster
NAME                 HEALTH   NODES   VERSION   PHASE     AGE
opensearch-cluster   yellow   6       2.8.0     RUNNING   66d

kubectl describe OpenSearchCluster opensearch-cluster
Name:         opensearch-cluster
Namespace:    opensearch
Labels:       app.kubernetes.io/managed-by=Helm
Annotations:  meta.helm.sh/release-name: opensearch-cluster
              meta.helm.sh/release-namespace: opensearch
API Version:  opensearch.opster.io/v1
Kind:         OpenSearchCluster
Metadata:
  Creation Timestamp:  2024-02-15T15:44:05Z
  Finalizers:
    Opster
  Generation:        5
  Resource Version:  86865705
  UID:               73cea515-fdf2-4d78-a841-ec91a889a1a3
Spec:
  Bootstrap:
    Resources:
  Conf Mgmt:
  Dashboards:
    Enable:  true
    Opensearch Credentials Secret:
      Name:    admin-credentials-secret
    Replicas:  1
    Resources:
      Limits:
        Cpu:     600m
        Memory:  4096Mi
      Requests:
        Cpu:     300m
        Memory:  4096Mi
    Service:
      Type:   ClusterIP
    Version:  2.3.0
  General:
    Drain Data Nodes:  true
    Http Port:         9200
    Monitoring:
    Service Name:          opensearch-cluster
    Set VM Max Map Count:  true
    Vendor:                opensearch
    Version:               2.8.0
  Init Helper:
    Resources:
  Node Pools:
    Component:  cluster-managers
    Disk Size:  100Gi
    Replicas:   3
    Resources:
      Limits:
        Cpu:     1600m
        Memory:  4096Mi
      Requests:
        Cpu:     500m
        Memory:  2048Mi
    Roles:
      cluster_manager
    Component:  data-nodes
    Disk Size:  4096Gi
    Replicas:   3
    Resources:
      Limits:
        Cpu:     2000m
        Memory:  24576Mi
      Requests:
        Cpu:     1000m
        Memory:  24576Mi
    Roles:
      data
  Security:
    Config:
      Admin Credentials Secret:
        Name:  admin-credentials-secret
      Admin Secret:
      Security Config Secret:
        Name:  securityconfig-secret
    Tls:
      Http:
        Ca Secret:
        Generate:  true
        Secret:
      Transport:
        Ca Secret:
        Generate:  true
        Secret:
Status:
  Available Nodes:  6
  Components Status:
    Component:  Restarter
    Status:     Finished
  Health:       yellow
  Initialized:  true
  Phase:        RUNNING
  Version:      2.8.0
Events:
  Type    Reason          Age                 From                     Message
  ----    ------          ----                ----                     -------
  Normal  RollingRestart  53m (x68 over 63m)  containerset-controller  Starting to rolling restart

Here are my Helm values for the OpenSearch cluster:

opensearchCluster:
  enabled: true
  general:
    httpPort: "9200"
    version: 2.8.0
    serviceName: "opensearch-cluster"
    drainDataNodes: true
    setVMMaxMapCount: true
  dashboards:
    enable: true
    replicas: 1
    tls:
      enable: false
      generate: false
    opensearchCredentialsSecret:
      name: admin-credentials-secret
    resources:
      limits:
        memory: "4096Mi"
        cpu: "600m"
      requests:
        memory: "4096Mi"
        cpu: "300m"
  nodePools:
    - component: cluster-managers
      diskSize: "100Gi"
      replicas: 3
      roles:
        - "cluster_manager"
      resources:
        limits:
          memory: "4096Mi"
          cpu: "1600m"
        requests:
          memory: "2048Mi"
          cpu: "500m"
    - component: data-nodes
      diskSize: "4096Gi"
      replicas: 3
      roles:
        - "data"
      resources:
        limits:
          memory: "24576Mi"
          cpu: "2000m"
        requests:
          cpu: "1000m"
          memory: "24576Mi"
  security:
    config:
      adminCredentialsSecret:
        name: admin-credentials-secret
      securityConfigSecret:
        name: securityconfig-secret
    tls:
      transport:
        generate: true
      http:
        generate: true

And here are the Helm values for the operator:

COMPUTED VALUES:
fullnameOverride: ""
installCRDs: true
kubeRbacProxy:
  enable: true
  image:
    repository: gcr.io/kubebuilder/kube-rbac-proxy
    tag: v0.15.0
  livenessProbe:
    failureThreshold: 3
    httpGet:
      path: /healthz
      port: 10443
      scheme: HTTPS
    initialDelaySeconds: 10
    periodSeconds: 15
    successThreshold: 1
    timeoutSeconds: 3
  readinessProbe:
    failureThreshold: 3
    httpGet:
      path: /healthz
      port: 10443
      scheme: HTTPS
    initialDelaySeconds: 10
    periodSeconds: 15
    successThreshold: 1
    timeoutSeconds: 3
  resources:
    limits:
      cpu: 50m
      memory: 50Mi
    requests:
      cpu: 25m
      memory: 25Mi
  securityContext: null
manager:
  dnsBase: cluster.local
  extraEnv: []
  image:
    pullPolicy: Always
    repository: opensearchproject/opensearch-operator
    tag: ""
  imagePullSecrets: []
  livenessProbe:
    failureThreshold: 3
    httpGet:
      path: /healthz
      port: 8081
    initialDelaySeconds: 10
    periodSeconds: 15
    successThreshold: 1
    timeoutSeconds: 3
  loglevel: info
  parallelRecoveryEnabled: true
  readinessProbe:
    failureThreshold: 3
    httpGet:
      path: /readyz
      port: 8081
    initialDelaySeconds: 10
    periodSeconds: 15
    successThreshold: 1
    timeoutSeconds: 3
  resources:
    limits:
      cpu: 350m
      memory: 8192Mi
    requests:
      cpu: 100m
      memory: 1024Mi
  securityContext:
    allowPrivilegeEscalation: false
  watchNamespace: null
nameOverride: ""
nodeSelector: {}
securityContext:
  runAsNonRoot: true
serviceAccount:
  create: true
  name: ""
tolerations: []

Here are some log entries from my operator that repeat every few seconds:

{"level":"info","ts":"2024-04-22T13:00:10.876Z","msg":"Reconciling OpenSearchCluster","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch-cluster","namespace":"opensearch"},"namespace":"opensearch","name":"opensearch-cluster","reconcileID":"153edb1f-786a-426c-9e88-23569d99696b","cluster":{"name":"opensearch-cluster","namespace":"opensearch"}}
{"level":"info","ts":"2024-04-22T13:00:10.892Z","msg":"Generating certificates","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch-cluster","namespace":"opensearch"},"namespace":"opensearch","name":"opensearch-cluster","reconcileID":"153edb1f-786a-426c-9e88-23569d99696b","interface":"transport"}
{"level":"info","ts":"2024-04-22T13:00:10.892Z","msg":"Generating certificates","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch-cluster","namespace":"opensearch"},"namespace":"opensearch","name":"opensearch-cluster","reconcileID":"153edb1f-786a-426c-9e88-23569d99696b","interface":"http"}
{"level":"info","ts":"2024-04-22T13:00:10.894Z","msg":"ServiceMonitor crd not found, skipping deletion","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch-cluster","namespace":"opensearch"},"namespace":"opensearch","name":"opensearch-cluster","reconcileID":"153edb1f-786a-426c-9e88-23569d99696b"}

I haven't found any errors in the logs for the manager-node or data-node.
After I remove data-node-0 from the exclusion list

curl --insecure -H "Content-type: application/json" -X PUT https://admin:password@opensearch-cluster.opensearch.svc.cluster.local:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.allocation.exclude._name" : null } }'

The unassigned shards slowly being processed.
However, data-node-0 is then re-added to the exclusion list after some seconds.

Answer 1 · 2024-05-07T13:15:10.000Z

Hi @hollowdew. Not sure why this is happening. The only operator components that set the exclusion are the restarter and the upgrader. And accoding to your status none of them are doing anything.
Could you please set drainDataNodes to false (the drain should only be needed for emptyDir and since you have no extra persistence config it uses PVCs) and see if that stops the node being added to the exclusion list?

Answer 2 · 2024-05-07T14:15:22.000Z

Hello @swoehrl-mw,

I greatly appreciate your suggestion. I have updated my cluster and manually removed data-node0 from the exclusion list once more. I will monitor to see if it gets excluded again and will provide an update tomorrow.

Best regards

Answer 3 · 2024-06-03T11:56:44.000Z

Hello @swoehrl-mw,
sorry for that late reply.
It seems to be fixed after setting drainDataNodes to false.
Thank you very much for your help.