Solr pods are not stable on aks
Closed this issue · 2 comments
zelima commented
Solr pods keep restarting from time to time leading problems in deployment. Eg ckan instance will fail to connect to solr if one of them is in CrashLoop. For 26 h they've restarted >200 time some of them
kubectl get pods -n ckan-cloud
NAME READY STATUS RESTARTS AGE
ckan-cloud-provider-db-proxy-azuresql-proxy-5c67484c4-82kp8 1/1 Running 0 26h
ckan-cloud-provider-solr-solrcloud-sc-3-d7bc4cc9f-nrpb2 1/1 Running 111 26h
ckan-cloud-provider-solr-solrcloud-sc-4-56f96f98c-swvwk 0/1 Running 133 26h
ckan-cloud-provider-solr-solrcloud-sc-5-584bb7ddd4-hgv6q 1/1 Running 136 26h
ckan-cloud-provider-solr-solrcloud-zk-0-8964b55b4-nf6hg 0/1 OOMKilled 177 26h
ckan-cloud-provider-solr-solrcloud-zk-1-55f6744d9f-d9blr 0/1 Running 232 26h
ckan-cloud-provider-solr-solrcloud-zk-2-7d4fd5fb-rv42c 0/1 CrashLoopBackOff 213 26h
ckan-cloud-provider-solr-solrcloud-zoonavigator-7bdd5575bf89ph4 2/2 Running 0 26h
router-traefik-instances-default-55cf88f7cd-29srv 1/1 Running 0 20h
tasks
- Identify the reason
zelima commented
This is what I see from kubectl describe
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 42m (x28 over 6d3h) kubelet, aks-default-18083181-1 Readiness probe failed: OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "argument list too long": unknown
Warning Unhealthy 25m (x2 over 3d) kubelet, aks-default-18083181-1 Liveness probe failed: /usr/bin/zkOk.sh: line 21: /bin/nc: Cannot allocate memory
Warning Unhealthy 25m (x199 over 6d3h) kubelet, aks-default-18083181-1 Readiness probe failed:
Warning Unhealthy 54s (x460 over 6d4h) kubelet, aks-default-18083181-1 Liveness probe failed:
Warning BackOff 52s (x7574 over 6d4h) kubelet, aks-default-18083181-1 Back-off restarting failed container
Seems like zookeeper has very low memory allocated.
@akariv I see resources for SOLR pods are updated in this commit 22423fe#diff-167428a1f2e6a73b2e24fcc3c27fbd97L275
What do you think if I make them configurable (with the current defaults) Eg in interactive mode? Right now resources for both zookeeper and solrcloud are hardcoded to 200mi and 8GB.
Something like
diff --git a/interactive.yaml b/interactive.yaml
index abec8fc..381200e 100644
--- a/interactive.yaml
+++ b/interactive.yaml
@@ -12,9 +12,11 @@ default:
self-hosted: y
num-shards: "1"
replication-factor: "1"
+ sc-resources: '{"limits":{"memory":"1Gi"}, "requested": {"memory":"1Gi"}}'
+ zk-resources: '{"limits":{"memory":"1Gi", "cpu":"1"}, "requested": {"memory":"1Gi", "cpu":"0.5"}}'
ckan-storage-config:
default-storage-bucket: ckan
akariv commented
I hope this is indeed the problem that's causing this, but we can try.
I see no harm in making these configurable.