Alertmanager CrashLoopBackOff
80kk opened this issue · 2 comments
80kk commented
Can't make it running and it is causing prometheus-core CrashLoopBackOff:
kubectl describe pods alertmanager-6c6947f55f-h9wc9 -n monitoring
Name: alertmanager-6c6947f55f-h9wc9
Namespace: monitoring
Priority: 0
PriorityClassName: <none>
Node: node3/192.168.101.15
Start Time: Wed, 14 Nov 2018 11:28:02 +0000
Labels: app=alertmanager
pod-template-hash=6c6947f55f
Annotations: <none>
Status: Running
IP: 10.233.64.6
Controlled By: ReplicaSet/alertmanager-6c6947f55f
Containers:
alertmanager:
Container ID: docker://0507cdf93fefdfe12b964117ae9d5e3685eb20d50f2bd504e8e9105963650435
Image: quay.io/prometheus/alertmanager:v0.15.3
Image ID: docker-pullable://quay.io/prometheus/alertmanager@sha256:27410e5c88aaaf796045e825b257a0857cca0876ca3804ba61175dd8a9f5b798
Port: 9093/TCP
Host Port: 0/TCP
Args:
-config.file=/etc/alertmanager/config.yml
-storage.path=/alertmanager
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 14 Nov 2018 11:43:57 +0000
Finished: Wed, 14 Nov 2018 11:43:57 +0000
Ready: False
Restart Count: 8
Environment: <none>
Mounts:
/alertmanager from alertmanager (rw)
/etc/alertmanager from config-volume (rw)
/etc/alertmanager-templates from templates-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-nvb7c (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: alertmanager
Optional: false
templates-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: alertmanager-templates
Optional: false
alertmanager:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
default-token-nvb7c:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-nvb7c
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 16m default-scheduler Successfully assigned monitoring/alertmanager-6c6947f55f-h9wc9 to node3
Normal Pulling 16m kubelet, node3 pulling image "quay.io/prometheus/alertmanager:v0.15.3"
Normal Pulled 15m kubelet, node3 Successfully pulled image "quay.io/prometheus/alertmanager:v0.15.3"
Normal Created 14m (x5 over 15m) kubelet, node3 Created container
Normal Pulled 14m (x4 over 15m) kubelet, node3 Container image "quay.io/prometheus/alertmanager:v0.15.3" already present on machine
Normal Started 14m (x5 over 15m) kubelet, node3 Started container
Warning BackOff 53s (x73 over 15m) kubelet, node3 Back-off restarting failed container
pipo02mix commented
Can you share which errors give you in the logs?
$ kubectl logs -l app=alertmanager -n monitoring
# Add -p flag if it is crash looping to see the previous container error
hennow commented
Even though a bit older, i think i can help here. The crashLoop is showing because you changed the release of the alertmanager from 0.7.1 to 0.15.3. Since 0.12 or 0.13 there has to be 2 dashes in front of the arguments.
The config has to be changed to:
'--config.file=/etc/alertmanager/config.yml'
'--storage.path=/alertmanager'
Should fix that issue, at least for the alertmanager. Trying to change other releases within the manifest causes new issues.