SolrCloud Pod moved to new Node - Replica Migration pending
brickpattern opened this issue · 8 comments
Environment:
Solr Operator Helm : 0.8.0
Solr 9.4 container image
3 node cluster
Persistent storage option (w/ localvolume provisioner)
Managed upgrade strategy.
The K8S node for solrcloud-0 got cordon and Pod was moved to a new node.
When the pod came up on new node, its recognized by as part of the SolrCloud statefulset, but at the Collection level replica was lost on the node. Looking at the Stateful , there's a cluster lock.
solr.apache.org/clusterOpsLock: >-
{"operation":"RollingUpdate","lastStartTime":"2023-12-14T22:08:47Z","metadata":"{\"requiresReplicaMigration\":false}"}
Plz allow me to ask if im missing a step in the process...
Should the operator automatically do the replica migration?
I have read about Rebalance API and using 9.4 version.
Is there a way to manually kick off the replica migration step to that specific POD?
SolrCloud custom definition.
apiVersion: solr.apache.org/v1beta1
kind: SolrCloud
metadata:
annotations:
meta.helm.sh/release-name: solr
meta.helm.sh/release-namespace: solr
creationTimestamp: '2023-12-14T21:10:55Z'
finalizers:
- storage.finalizers.solr.apache.org
generation: 3
labels:
app.kubernetes.io/instance: solr
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: solr
app.kubernetes.io/version: 8.11.1
helm.sh/chart: solr-0.8.0
managedFields:
- apiVersion: solr.apache.org/v1beta1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
...
--removed metadata here for readability
...
manager: solr-operator
operation: Update
time: '2023-12-14T21:10:55Z'
- apiVersion: solr.apache.org/v1beta1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
...
--removed metadata here for readability
...
manager: helm
operation: Update
time: '2023-12-14T21:58:46Z'
- apiVersion: solr.apache.org/v1beta1
fieldsType: FieldsV1
fieldsV1:
f:status:
...
---removed fields{} for readability
...
manager: solr-operator
operation: Update
subresource: status
time: '2023-12-14T21:59:05Z'
name: solr
namespace: solr
resourceVersion: '115467'
uid: 559977d2-2fd0-42fa-bf28-08bc5cebf851
selfLink: /apis/solr.apache.org/v1beta1/namespaces/solr/solrclouds/solr
status:
externalCommonAddress: http://solr-solr-solrcloud.k8s.solr.cloud
internalCommonAddress: http://solr-solrcloud-common.solr
podSelector: solr-cloud=solr,technology=solr-cloud
readyReplicas: 3
replicas: 3
solrNodes:
- externalAddress: http://solr-solr-solrcloud-0.k8s.solr.cloud
internalAddress: http://solr-solrcloud-0.solr
name: solr-solrcloud-0
nodeName: ip-x-y-162-17.us-west-2.compute.internal
ready: true
scheduledForDeletion: false
specUpToDate: true
version: '0.8'
- externalAddress: http://solr-solr-solrcloud-1.k8s.solr.cloud
internalAddress: http://solr-solrcloud-1.solr
name: solr-solrcloud-1
nodeName: ip-x-y-160-139.us-west-2.compute.internal
ready: true
scheduledForDeletion: false
specUpToDate: false
version: '0.8'
- externalAddress: http://solr-solr-solrcloud-2.k8s.solr.cloud
internalAddress: http://solr-solrcloud-2.solr
name: solr-solrcloud-2
nodeName: ip-x-y-163-213.us-west-2.compute.internal
ready: true
scheduledForDeletion: false
specUpToDate: false
version: '0.8'
upToDateNodes: 1
version: '0.8'
zookeeperConnectionInfo:
chroot: /
externalConnectionString: N/A
internalConnectionString: >-
solr-solrcloud-zookeeper-0.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-1.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-2.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181
spec:
availability:
podDisruptionBudget:
enabled: true
method: ClusterWide
busyBoxImage:
repository: library/busybox
tag: 1.28.0-glibc
customSolrKubeOptions:
podOptions:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: technology
operator: In
values:
- solr-cloud
- key: solr-cloud
operator: In
values:
- solr
topologyKey: topology.kubernetes.io/zone
weight: 100
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: technology
operator: In
values:
- solr-cloud
- key: solr-cloud
operator: In
values:
- solr
topologyKey: kubernetes.io/hostname
annotations:
manualrestart: '2023-12-14T00:00:01Z'
defaultInitContainerResources: {}
resources:
limits:
cpu: '16'
memory: 32G
requests:
cpu: '8'
memory: 16G
serviceAccountName: solr-operator
tolerations:
- effect: NoSchedule
key: role
operator: Equal
value: solr-cluster
dataStorage:
persistent:
pvcTemplate:
metadata: {}
spec:
resources:
requests:
storage: 500Gi
storageClassName: my-disks
reclaimPolicy: Delete
replicas: 3
scaling:
populatePodsOnScaleUp: true
vacatePodsOnScaleDown: true
solrAddressability:
commonServicePort: 80
external:
domainName: k8s.solr.cloud
method: Ingress
nodePortOverride: 80
useExternalAddress: false
podPort: 8983
solrImage:
pullPolicy: Always
repository: mycustom.registry.builton.9-4solr
tag: latest
solrJavaMem: '-Xms8192m -Xmx16384m'
solrLogLevel: INFO
solrOpts: '-Denable.runtime.lib=true -Denable.packages=true'
updateStrategy:
managed: {}
method: Managed
zookeeperRef:
provided:
adminServerService: {}
chroot: /
clientService: {}
config: {}
ephemeral:
emptydirvolumesource: {}
headlessService: {}
image:
pullPolicy: IfNotPresent
repository: pravega/zookeeper
maxUnavailableReplicas: 1
replicas: 3
zookeeperPodPolicy:
resources: {}
serviceAccountName: solr-operator
Solr Operator logs
2023-12-14T21:39:42Z INFO Update required because field changed {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "c747c5ad-f7b1-400a-8957-ff60a71b7531", "statefulSet": "solr-solrcloud", "kind": "statefulSet", "field": "Spec.Template.Annotations", "from": {"kubectl.kubernetes.io/restartedAt":"2023-12-14T15:39:42-06:00","solr.apache.org/solrXmlMd5":"5fe99d590bc63efc3caa743ca939aa5a"}, "to": {"solr.apache.org/solrXmlMd5":"5fe99d590bc63efc3caa743ca939aa5a"}}
2023-12-14T21:39:42Z INFO Updating StatefulSet {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "c747c5ad-f7b1-400a-8957-ff60a71b7531", "statefulSet": "solr-solrcloud"}
2023-12-14T21:39:42Z INFO Started locked clusterOp {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "122fddd5-d036-4d31-872e-0d4b95bb27a3", "clusterOp": "RollingUpdate", "clusterOpMetadata": "{\"requiresReplicaMigration\":false}"}
2023-12-14T21:39:42Z INFO Updating SolrCloud Status {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "122fddd5-d036-4d31-872e-0d4b95bb27a3", "status": {"solrNodes":[{"name":"solr-solrcloud-0","nodeName":"ip-x-y-162-17.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-0.solr","externalAddress":"http://solr-solr-solrcloud-0.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false},{"name":"solr-solrcloud-1","nodeName":"ip-x-y-160-139.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-1.solr","externalAddress":"http://solr-solr-solrcloud-1.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false},{"name":"solr-solrcloud-2","nodeName":"ip-x-y-163-213.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-2.solr","externalAddress":"http://solr-solr-solrcloud-2.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false}],"replicas":3,"podSelector":"solr-cloud=solr,technology=solr-cloud","readyReplicas":3,"upToDateNodes":0,"version":"0.8","internalCommonAddress":"http://solr-solrcloud-common.solr","externalCommonAddress":"http://solr-solr-solrcloud.k8s.solr.cloud","zookeeperConnectionInfo":{"internalConnectionString":"solr-solrcloud-zookeeper-0.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-1.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-2.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181","externalConnectionString":"N/A","chroot":"/"},"backupRestoreReady":false}}
2023-12-14T21:39:42Z INFO Removed unneeded clusterOpLock annotation from statefulSet {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "cb77fc36-beda-4e54-a0c5-0b2726756c66", "reason": "RollingUpdate complete"}
2023-12-14T21:39:42Z INFO Updating SolrCloud Status {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "cb77fc36-beda-4e54-a0c5-0b2726756c66", "status": {"solrNodes":[{"name":"solr-solrcloud-0","nodeName":"ip-x-y-162-17.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-0.solr","externalAddress":"http://solr-solr-solrcloud-0.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":true,"scheduledForDeletion":false},{"name":"solr-solrcloud-1","nodeName":"ip-x-y-160-139.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-1.solr","externalAddress":"http://solr-solr-solrcloud-1.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":true,"scheduledForDeletion":false},{"name":"solr-solrcloud-2","nodeName":"ip-x-y-163-213.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-2.solr","externalAddress":"http://solr-solr-solrcloud-2.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":true,"scheduledForDeletion":false}],"replicas":3,"podSelector":"solr-cloud=solr,technology=solr-cloud","readyReplicas":3,"upToDateNodes":3,"version":"0.8","internalCommonAddress":"http://solr-solrcloud-common.solr","externalCommonAddress":"http://solr-solr-solrcloud.k8s.solr.cloud","zookeeperConnectionInfo":{"internalConnectionString":"solr-solrcloud-zookeeper-0.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-1.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-2.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181","externalConnectionString":"N/A","chroot":"/"},"backupRestoreReady":false}}
2023-12-14T21:58:46Z INFO Update required because field changed {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "71340161-9ee5-47d9-b7f4-7630ad2ab132", "statefulSet": "solr-solrcloud", "kind": "statefulSet", "field": "Spec.Template.Annotations", "from": {"solr.apache.org/solrXmlMd5":"5fe99d590bc63efc3caa743ca939aa5a"}, "to": {"manualrestart":"2023-12-14T00:00:01Z","solr.apache.org/solrXmlMd5":"5fe99d590bc63efc3caa743ca939aa5a"}}
2023-12-14T21:58:46Z INFO Updating StatefulSet {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "71340161-9ee5-47d9-b7f4-7630ad2ab132", "statefulSet": "solr-solrcloud"}
2023-12-14T21:58:46Z INFO Started locked clusterOp {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "a09d90c0-06f2-4b59-9c68-463c97261d5b", "clusterOp": "RollingUpdate", "clusterOpMetadata": "{\"requiresReplicaMigration\":false}"}
2023-12-14T21:58:46Z INFO Updating SolrCloud Status {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "a09d90c0-06f2-4b59-9c68-463c97261d5b", "status": {"solrNodes":[{"name":"solr-solrcloud-0","nodeName":"ip-x-y-162-17.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-0.solr","externalAddress":"http://solr-solr-solrcloud-0.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false},{"name":"solr-solrcloud-1","nodeName":"ip-x-y-160-139.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-1.solr","externalAddress":"http://solr-solr-solrcloud-1.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false},{"name":"solr-solrcloud-2","nodeName":"ip-x-y-163-213.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-2.solr","externalAddress":"http://solr-solr-solrcloud-2.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false}],"replicas":3,"podSelector":"solr-cloud=solr,technology=solr-cloud","readyReplicas":3,"upToDateNodes":0,"version":"0.8","internalCommonAddress":"http://solr-solrcloud-common.solr","externalCommonAddress":"http://solr-solr-solrcloud.k8s.solr.cloud","zookeeperConnectionInfo":{"internalConnectionString":"solr-solrcloud-zookeeper-0.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-1.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-2.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181","externalConnectionString":"N/A","chroot":"/"},"backupRestoreReady":false}}
2023-12-14T21:58:46Z INFO ManagedUpdateSelector Pod update selection started. {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "22e68695-cbc6-4f0d-8a40-7cf26a1841b6", "outOfDatePods": 3, "maxPodsUnavailable": 1, "unavailableUpdatedPods": 0, "outOfDatePodsNotStarted": 0, "alreadyScheduledForDeletion": 0, "maxPodsToUpdate": 1}
2023-12-14T21:58:46Z INFO ManagedUpdateSelector Pod selected to be deleted for update. {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "22e68695-cbc6-4f0d-8a40-7cf26a1841b6", "pod": "solr-solrcloud-0", "reason": "Pod's replicas are safe to take down, adhering to the minimum active replicas per shard."}
2023-12-14T21:58:46Z INFO ManagedUpdateSelector Pod update selection complete. Maximum number of pods able to be updated reached. {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "22e68695-cbc6-4f0d-8a40-7cf26a1841b6", "maxPodsToUpdate": 1}
2023-12-14T21:58:46Z INFO ManagedUpdateSelector Deleting solr pod for update {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "22e68695-cbc6-4f0d-8a40-7cf26a1841b6", "pod": "solr-solrcloud-0"}
2023-12-14T21:58:46Z INFO Updating SolrCloud Status {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "22e68695-cbc6-4f0d-8a40-7cf26a1841b6", "status": {"solrNodes":[{"name":"solr-solrcloud-0","nodeName":"ip-x-y-162-17.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-0.solr","externalAddress":"http://solr-solr-solrcloud-0.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false},{"name":"solr-solrcloud-1","nodeName":"ip-x-y-160-139.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-1.solr","externalAddress":"http://solr-solr-solrcloud-1.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false},{"name":"solr-solrcloud-2","nodeName":"ip-x-y-163-213.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-2.solr","externalAddress":"http://solr-solr-solrcloud-2.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false}],"replicas":3,"podSelector":"solr-cloud=solr,technology=solr-cloud","readyReplicas":3,"upToDateNodes":0,"version":"0.8","internalCommonAddress":"http://solr-solrcloud-common.solr","externalCommonAddress":"http://solr-solr-solrcloud.k8s.solr.cloud","zookeeperConnectionInfo":{"internalConnectionString":"solr-solrcloud-zookeeper-0.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-1.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-2.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181","externalConnectionString":"N/A","chroot":"/"},"backupRestoreReady":false}}
2023-12-14T21:58:46Z INFO ManagedUpdateSelector Pod update selection not started. The number of unavailable pods unavailable (or scheduled for deletion) equals or exceeds the calculated maxPodsUnavailable. {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "d091496b-8199-419f-b541-3709fc4cbd03", "outOfDatePods": 2, "maxPodsUnavailable": 1, "unavailableUpdatedPods": 0, "outOfDatePodsNotStarted": 0, "alreadyScheduledForDeletion": 1}
2023-12-14T21:58:46Z INFO ManagedUpdateSelector Deleting solr pod for update {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "d091496b-8199-419f-b541-3709fc4cbd03", "pod": "solr-solrcloud-0"}
2023-12-14T21:58:46Z INFO Updating SolrCloud Status {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "d091496b-8199-419f-b541-3709fc4cbd03", "status": {"solrNodes":[{"name":"solr-solrcloud-0","nodeName":"ip-x-y-162-17.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-0.solr","externalAddress":"http://solr-solr-solrcloud-0.k8s.solr.cloud","ready":false,"version":"0.8","specUpToDate":false,"scheduledForDeletion":true},{"name":"solr-solrcloud-1","nodeName":"ip-x-y-160-139.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-1.solr","externalAddress":"http://solr-solr-solrcloud-1.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false},{"name":"solr-solrcloud-2","nodeName":"ip-x-y-163-213.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-2.solr","externalAddress":"http://solr-solr-solrcloud-2.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false}],"replicas":3,"podSelector":"solr-cloud=solr,technology=solr-cloud","readyReplicas":2,"upToDateNodes":0,"version":"0.8","internalCommonAddress":"http://solr-solrcloud-common.solr","externalCommonAddress":"http://solr-solr-solrcloud.k8s.solr.cloud","zookeeperConnectionInfo":{"internalConnectionString":"solr-solrcloud-zookeeper-0.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-1.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-2.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181","externalConnectionString":"N/A","chroot":"/"},"backupRestoreReady":false}}
2023-12-14T21:58:47Z INFO ManagedUpdateSelector Pod update selection not started. The number of unavailable pods unavailable (or scheduled for deletion) equals or exceeds the calculated maxPodsUnavailable. {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "63e8cd10-0938-451f-bdfe-d5c7e0bbcb6a", "outOfDatePods": 2, "maxPodsUnavailable": 1, "unavailableUpdatedPods": 0, "outOfDatePodsNotStarted": 0, "alreadyScheduledForDeletion": 1}
2023-12-14T21:58:47Z INFO ManagedUpdateSelector Deleting solr pod for update {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "63e8cd10-0938-451f-bdfe-d5c7e0bbcb6a", "pod": "solr-solrcloud-0"}
2023-12-14T21:58:47Z INFO ManagedUpdateSelector Pod update selection not started. The number of unavailable pods unavailable (or scheduled for deletion) equals or exceeds the calculated maxPodsUnavailable. {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "d7ad8183-8e6b-4377-8901-fbd4a8129672", "outOfDatePods": 2, "maxPodsUnavailable": 1, "unavailableUpdatedPods": 0, "outOfDatePodsNotStarted": 0, "alreadyScheduledForDeletion": 1}
2023-12-14T21:58:47Z INFO ManagedUpdateSelector Deleting solr pod for update {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "d7ad8183-8e6b-4377-8901-fbd4a8129672", "pod": "solr-solrcloud-0"}
2023-12-14T21:58:49Z INFO ManagedUpdateSelector Pod update selection not started. The number of unavailable pods unavailable (or scheduled for deletion) equals or exceeds the calculated maxPodsUnavailable. {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "291ef125-ec68-40aa-945f-c3fd10cfbf5b", "outOfDatePods": 2, "maxPodsUnavailable": 1, "unavailableUpdatedPods": 1, "outOfDatePodsNotStarted": 0, "alreadyScheduledForDeletion": 0}
2023-12-14T21:58:49Z INFO Updating SolrCloud Status {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "291ef125-ec68-40aa-945f-c3fd10cfbf5b", "status": {"solrNodes":[{"name":"solr-solrcloud-0","nodeName":"ip-x-y-162-17.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-0.solr","externalAddress":"http://solr-solr-solrcloud-0.k8s.solr.cloud","ready":false,"version":"0.8","specUpToDate":true,"scheduledForDeletion":false},{"name":"solr-solrcloud-1","nodeName":"ip-x-y-160-139.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-1.solr","externalAddress":"http://solr-solr-solrcloud-1.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false},{"name":"solr-solrcloud-2","nodeName":"ip-x-y-163-213.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-2.solr","externalAddress":"http://solr-solr-solrcloud-2.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false}],"replicas":3,"podSelector":"solr-cloud=solr,technology=solr-cloud","readyReplicas":2,"upToDateNodes":1,"version":"0.8","internalCommonAddress":"http://solr-solrcloud-common.solr","externalCommonAddress":"http://solr-solr-solrcloud.k8s.solr.cloud","zookeeperConnectionInfo":{"internalConnectionString":"solr-solrcloud-zookeeper-0.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-1.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-2.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181","externalConnectionString":"N/A","chroot":"/"},"backupRestoreReady":false}}
2023-12-14T21:58:49Z INFO ManagedUpdateSelector Pod update selection not started. The number of unavailable pods unavailable (or scheduled for deletion) equals or exceeds the calculated maxPodsUnavailable. {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "1bfacd90-5943-43cb-8635-b773b5f4a2f0", "outOfDatePods": 2, "maxPodsUnavailable": 1, "unavailableUpdatedPods": 1, "outOfDatePodsNotStarted": 0, "alreadyScheduledForDeletion": 0}
2023-12-14T21:58:50Z INFO ManagedUpdateSelector Pod update selection not started. The number of unavailable pods unavailable (or scheduled for deletion) equals or exceeds the calculated maxPodsUnavailable. {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "d03b7902-1f06-4e1f-9a26-6334d327a8de", "outOfDatePods": 2, "maxPodsUnavailable": 1, "unavailableUpdatedPods": 1, "outOfDatePodsNotStarted": 0, "alreadyScheduledForDeletion": 0}
2023-12-14T21:59:00Z INFO ManagedUpdateSelector Pod update selection not started. The number of unavailable pods unavailable (or scheduled for deletion) equals or exceeds the calculated maxPodsUnavailable. {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "ce0ba8f5-6eb5-4717-8fc4-e3d2c1bc750a", "outOfDatePods": 2, "maxPodsUnavailable": 1, "unavailableUpdatedPods": 1, "outOfDatePodsNotStarted": 0, "alreadyScheduledForDeletion": 0}
2023-12-14T21:59:05Z INFO ManagedUpdateSelector Pod update selection started. {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "bb32cce3-ce4b-4a71-bf07-f828890cb313", "outOfDatePods": 2, "maxPodsUnavailable": 1, "unavailableUpdatedPods": 0, "outOfDatePodsNotStarted": 0, "alreadyScheduledForDeletion": 0, "maxPodsToUpdate": 1}
2023-12-14T21:59:05Z INFO ManagedUpdateSelector Pod not able to be killed for update. {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "bb32cce3-ce4b-4a71-bf07-f828890cb313", "pod": "solr-solrcloud-2", "reason": "Shard bookmarks|shard1 already has 1 replicas not active, taking down 1 more would put it over the maximum allowed down: 1"}
changing the configs and eliminated the dependency on ZK being on same node as solrcloud pod.
Regardless of the dataStorage type (ephemeral or persistent), when the POD comes up ... the core config its looking for is at $SOLR_HOME/data/<<collection_shardN_replica_nN>>
This file/folder is missing on that rehomed pod.
Looking at other PODs, this respective file folder contents are written by "#Written by CorePropertiesLocator" .
So Q -
-
Should Solr Operator or the SolrCloud itself talking to ZK place necessary config files to have the replica filled in with data?
-
In previous versions of Solr (say Solr 7) , there was a parameter at Create Collection API to set AUTOADDREPLICA ? I dont see that in 9.4. Any relevance with collection properties that POD replacement is losing data?
bump...
Newer versions of Solr do not have an AutoAddReplica feature.
What kind of persistent volumes are you using? The data should not be missing when the pod is restarted. That's a failure of kubernetes/your PVC, and the Solr Operator isn't built to handle that. When you are running with Persistent Data it will expect the data to be there when restarted.
If you are running with ephemeral data, it will remove the data from the node before killing the pod. It can get into a bad state if the pod is killed on its own, and the data isn't moved beforehand.
I have tested using both Persistent and Ephemeral resulting in loss of data for that solr node.
For Persistent storage , using the local volume provisioner. As long as the POD comes back in the same EKS node it binds to the PVC to the same PV and retains the data. But when the POD gets scheduled to another EKS node (which is my scenario) the data is lost. the folders/directory for the core config on that replaced POD is void of any data.
The only ways that local volumes work as PVs is if the PVs that are created have node limitations (i.e. the Pod connected to the PV cannot be rescheduled onto another node). Are you sure that the local volume provisioner is setup correctly?
yes, PVs are setup correctly. Solr PODS come up correctly either by evicting or restarting.
The specific scenario I'm certifying is EKS node taint n drain ( replace node with new hardware).
So it appears from your description, Operator will NOT move the data as the local volume is tied to a EKs node.
Is there a recommendation to manually trigger data replication from other 2 nodes ? like a API call to fulfill the data
$SOLR_HOME/data/<<collection_shardN_replica_nN>>
Ahhh yes during node draining. That is a problem.
Yes, that is correct. What I would do is issue a Replace node command, moving all of the replicas off of the data-less pod. Then you can do a balance after that to move replicas back onto that pod.
It would be nice to have a command to fix all of the data on broken replicas. Maybe I'll make a JIRA for that.
One thing the operator can do is notice that a PV has changed, and if so automate the replica moving to restore data. Can you confirm that the PVs that are tied to the Solr PVCs change after draining the node? If so we can watch those PVs and try to fix the data if they are changed. (i.e. the data might be gone)