prometheus-stack-prometheus-0/1 pods are stuck in status "init:0/1"
alienninja opened this issue · 2 comments
What happened?
The prometheus-stack-prometheus-0/1 pods are stuck in status "init:0/1"
Did you expect to see something different?
I expected the pods to complete initialization
How to reproduce it (as minimally and precisely as possible):
Fresh helm installation of tobs with OpenEBS as storage backend
Environment
Helm deployment on clean K8s cluster v1.24.6 with OpenEBS as the backend storage
- tobs version:
Helm chart version 17.22.0
with modified values to include a patched image pg14.6-ts2.8.1-patroni-static-primary-p0 related to closed issue
#646 (comment)
- Kubernetes version information:
1.24.6
-
Kubernetes cluster kind:
kubespray deployment on baremetal servers -
tobs Logs:
Name: prometheus-tobs-kube-prometheus-stack-prometheus-0
Namespace: tobs
Priority: 0
Service Account: tobs-kube-prometheus-stack-prometheus
Node: knode19/20.20.20.219
Start Time: Mon, 21 Nov 2022 19:10:02 +0000
Labels: app.kubernetes.io/instance=tobs-kube-prometheus-stack-prometheus
app.kubernetes.io/managed-by=prometheus-operator
app.kubernetes.io/name=prometheus
app.kubernetes.io/version=2.40.1
controller-revision-hash=prometheus-tobs-kube-prometheus-stack-prometheus-54c6b6896f
operator.prometheus.io/name=tobs-kube-prometheus-stack-prometheus
operator.prometheus.io/shard=0
prometheus=tobs-kube-prometheus-stack-prometheus
statefulset.kubernetes.io/pod-name=prometheus-tobs-kube-prometheus-stack-prometheus-0
Annotations: kubectl.kubernetes.io/default-container: prometheus
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/prometheus-tobs-kube-prometheus-stack-prometheus
Init Containers:
init-config-reloader:
Container ID:
Image: quay.io/prometheus-operator/prometheus-config-reloader:v0.60.1
Image ID:
Port: 8080/TCP
Host Port: 0/TCP
Command:
/bin/prometheus-config-reloader
Args:
--watch-interval=0
--listen-address=:8080
--config-file=/etc/prometheus/config/prometheus.yaml.gz
--config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
--watched-dir=/etc/prometheus/rules/prometheus-tobs-kube-prometheus-stack-prometheus-rulefiles-0
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Limits:
cpu: 200m
memory: 50Mi
Requests:
cpu: 100m
memory: 50Mi
Environment:
POD_NAME: prometheus-tobs-kube-prometheus-stack-prometheus-0 (v1:metadata.name)
SHARD: 0
Mounts:
/etc/prometheus/config from config (rw)
/etc/prometheus/config_out from config-out (rw)
/etc/prometheus/rules/prometheus-tobs-kube-prometheus-stack-prometheus-rulefiles-0 from prometheus-tobs-kube-prometheus-stack-prometheus-rulefiles-0 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mzmkp (ro)
Containers:
prometheus:
Container ID:
Image: quay.io/prometheus/prometheus:v2.40.1
Image ID:
Port: 9090/TCP
Host Port: 0/TCP
Args:
--web.console.templates=/etc/prometheus/consoles
--web.console.libraries=/etc/prometheus/console_libraries
--storage.tsdb.retention.time=1d
--config.file=/etc/prometheus/config_out/prometheus.env.yaml
--storage.tsdb.path=/prometheus
--web.enable-lifecycle
--web.external-url=http://tobs-kube-prometheus-stack-prometheus.tobs:9090
--web.route-prefix=/
--storage.tsdb.wal-compression
--web.config.file=/etc/prometheus/web_config/web-config.yaml
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 40m
memory: 400Mi
Liveness: http-get http://:http-web/-/healthy delay=0s timeout=3s period=5s #success=1 #failure=6
Readiness: http-get http://:http-web/-/ready delay=0s timeout=3s period=5s #success=1 #failure=3
Startup: http-get http://:http-web/-/ready delay=0s timeout=3s period=15s #success=1 #failure=60
Environment: <none>
Mounts:
/etc/prometheus/certs from tls-assets (ro)
/etc/prometheus/config_out from config-out (ro)
/etc/prometheus/rules/prometheus-tobs-kube-prometheus-stack-prometheus-rulefiles-0 from prometheus-tobs-kube-prometheus-stack-prometheus-rulefiles-0 (rw)
/etc/prometheus/web_config/web-config.yaml from web-config (ro,path="web-config.yaml")
/prometheus from prometheus-tobs-kube-prometheus-stack-prometheus-db (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mzmkp (ro)
config-reloader:
Container ID:
Image: quay.io/prometheus-operator/prometheus-config-reloader:v0.60.1
Image ID:
Port: 8080/TCP
Host Port: 0/TCP
Command:
/bin/prometheus-config-reloader
Args:
--listen-address=:8080
--reload-url=http://127.0.0.1:9090/-/reload
--config-file=/etc/prometheus/config/prometheus.yaml.gz
--config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
--watched-dir=/etc/prometheus/rules/prometheus-tobs-kube-prometheus-stack-prometheus-rulefiles-0
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Limits:
cpu: 200m
memory: 50Mi
Requests:
cpu: 100m
memory: 50Mi
Environment:
POD_NAME: prometheus-tobs-kube-prometheus-stack-prometheus-0 (v1:metadata.name)
SHARD: 0
Mounts:
/etc/prometheus/config from config (rw)
/etc/prometheus/config_out from config-out (rw)
/etc/prometheus/rules/prometheus-tobs-kube-prometheus-stack-prometheus-rulefiles-0 from prometheus-tobs-kube-prometheus-stack-prometheus-rulefiles-0 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mzmkp (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
prometheus-tobs-kube-prometheus-stack-prometheus-db:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: prometheus-tobs-kube-prometheus-stack-prometheus-db-prometheus-tobs-kube-prometheus-stack-prometheus-0
ReadOnly: false
config:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-tobs-kube-prometheus-stack-prometheus
Optional: false
tls-assets:
Type: Projected (a volume that contains injected data from multiple sources)
SecretName: prometheus-tobs-kube-prometheus-stack-prometheus-tls-assets-0
SecretOptionalName: <nil>
config-out:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
prometheus-tobs-kube-prometheus-stack-prometheus-rulefiles-0:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus-tobs-kube-prometheus-stack-prometheus-rulefiles-0
Optional: false
web-config:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-tobs-kube-prometheus-stack-prometheus-web-config
Optional: false
kube-api-access-mzmkp:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m46s default-scheduler Successfully assigned tobs/prometheus-tobs-kube-prometheus-stack-prometheus-0 to knode19
Warning FailedMount 3m43s kubelet Unable to attach or mount volumes: unmounted volumes=[prometheus-tobs-kube-prometheus-stack-prometheus-db], unattached volumes=[config-out prometheus-tobs-kube-prometheus-stack-prometheus-rulefiles-0 kube-api-access-mzmkp tls-assets prometheus-tobs-kube-prometheus-stack-prometheus-db web-config config]: timed out waiting for the condition
Warning FailedMount 2m7s (x9 over 5m35s) kubelet MountVolume.MountDevice failed for volume "pvc-1b377915-0f50-4170-9d8e-6224dcd98ece" : rpc error: code = Internal desc = Waiting for pvc-1b377915-0f50-4170-9d8e-6224dcd98ece's CVC to be bound
Warning FailedMount 89s kubelet Unable to attach or mount volumes: unmounted volumes=[prometheus-tobs-kube-prometheus-stack-prometheus-db], unattached volumes=[config config-out prometheus-tobs-kube-prometheus-stack-prometheus-rulefiles-0 kube-api-access-mzmkp tls-assets prometheus-tobs-kube-prometheus-stack-prometheus-db web-config]: timed out waiting for the condition
Anything else we need to know?:
The PVC referenced above, in question has been created and is bound:
Name: pvc-1b377915-0f50-4170-9d8e-6224dcd98ece
Labels: <none>
Annotations: pv.kubernetes.io/provisioned-by: cstor.csi.openebs.io
Finalizers: [kubernetes.io/pv-protection]
StorageClass: cstor-csi-ssd-disk
Status: Bound
Claim: tobs/prometheus-tobs-kube-prometheus-stack-prometheus-db-prometheus-tobs-kube-prometheus-stack-prometheus-0
Reclaim Policy: Delete
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 8Gi
Node Affinity: <none>
Message:
Source:
Type: CSI (a Container Storage Interface (CSI) volume source)
Driver: cstor.csi.openebs.io
FSType: ext4
VolumeHandle: pvc-1b377915-0f50-4170-9d8e-6224dcd98ece
ReadOnly: false
VolumeAttributes: openebs.io/cas-type=cstor
storage.kubernetes.io/csiProvisionerIdentity=1668714202752-8081-cstor.csi.openebs.io
Events: <none>
Could the issue be related to it looking for a volume
prometheus-tobs-kube-prometheus-stack-prometheus-db
but the actual volume is
prometheus-tobs-kube-prometheus-stack-prometheus-db-prometheus-tobs-kube-prometheus-stack-prometheus-0
I found the issue, the label is to long for prometheus-tobs-kube-prometheus-stack-prometheus-db-prometheus-tobs-kube-prometheus-stack-prometheus-1, this is being reported in the openebs-cstor-cvc-operator logs:
1122 02:53:44.533412 1 event.go:282] Event(v1.ObjectReference{Kind:"CStorVolumeConfig", Namespace:"openebs", Name:"pvc-27948946-bab3-4c45-acda-8cf7e8dc90e9", UID:"b56a2149-369c-4cef-8430-01f23ebc14e8", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"2390459", FieldPath:""}): type: 'Warning' reason: 'Provisioning' CStorVolume.cstor.openebs.io "pvc-27948946-bab3-4c45-acda-8cf7e8dc90e9" is invalid: metadata.labels: Invalid value: "prometheus-tobs-kube-prometheus-stack-prometheus-db-prometheus-tobs-kube-prometheus-stack-prometheus-1": must be no more than 63 characters
I1122 02:53:44.535808 1 controller.go:304] creating cstorvolume resource
E1122 02:53:44.543448 1 controller_base.go:321] error syncing 'openebs/pvc-c7d48022-8eb7-4909-9abe-eef03db97025': CStorVolume.cstor.openebs.io "pvc-c7d48022-8eb7-4909-9abe-eef03db97025" is invalid: metadata.labels: Invalid value: "prometheus-tobs-kube-prometheus-stack-prometheus-db-prometheus-tobs-kube-prometheus-stack-prometheus-0": must be no more than 63 characters, requeuing
I
I was able to solve my issue with help from issue #563.
Adding this line to my helm install command fixed the issue:
--set kube-prometheus-stack.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.metadata.name=data