Readiness probe failed: Get "http://172.16.1.4:8080/healthz": dial tcp 172.16.1.4:8080: i/o timeout (Client.Timeout exceeded while awaiting headers)
pjohnst5 opened this issue · 7 comments
Describe the bug
A goldpinger pod gets in crashloopbackoff reporting that the readiness probe failed due to timeout
This may not be a bug per se
But I am able to create pods (we'll call these "toolbox" pods) in the same network as the gold-pinger pods.
The toolbox pods can listen on port 8085, and also can serve up traffic to other toolbox pods
Question is: why might the gold-pinger pods not be able to serve up traffic? Or at least, get blocked while doing so?
Goldpinger describe:
k describe po k8s-podpool-25044293-0-pod-0
Name: k8s-podpool-25044293-0-pod-0
Namespace: default
Priority: 0
Node: k8s-podpool-25044293-0/10.240.0.4
Start Time: Mon, 28 Mar 2022 13:44:30 -0700
Labels: app=goldpinger
Annotations: kubernetes.io/psp: privileged
Status: Running
IP: 172.16.1.4
IPs:
IP: 172.16.1.4
Containers:
goldpinger:
Container ID: containerd://cf57961601d1733cedaaef946371e01c3d587441126792643af791e615eebde4
Image: docker.io/bloomberg/goldpinger:v3.0.0
Image ID: docker.io/bloomberg/goldpinger@sha256:6314bae32e0ec5e993369d6114cf4ade3eb055214ba80d0463d79435c7b2c7eb
Port: 8080/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Mon, 28 Mar 2022 13:57:47 -0700
Finished: Mon, 28 Mar 2022 13:58:06 -0700
Ready: False
Restart Count: 9
Limits:
memory: 80Mi
Requests:
cpu: 1m
memory: 40Mi
Liveness: http-get http://:8080/healthz delay=5s timeout=1s period=5s #success=1 #failure=3
Readiness: http-get http://:8080/healthz delay=5s timeout=1s period=5s #success=1 #failure=3
Environment:
HOST: 0.0.0.0
PORT: 8080
HOSTNAME: (v1:spec.nodeName)
POD_IP: (v1:status.podIP)
HOSTS_TO_RESOLVE: 1.1.1.1 8.8.8.8 www.bing.com
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cd5nc (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-cd5nc:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: dncnode=k8s-podpool-25044293-0
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> Successfully assigned default/k8s-podpool-25044293-0-pod-0 to k8s-podpool-25044293-0
Warning FailedCreatePodSandBox 16m kubelet, k8s-podpool-25044293-0 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "b4f6176516b42c0f9f7e8e1c605c7962998d3a01ba2e88cbd88f5c8170394c92": GetMultiTenancyCNIResult failed:GetContainerNetworkConfiguration failed:[Azure cnsclient] Code: 18 , Error: NetworkContainer doesn't exist.
Normal Started 15m (x2 over 15m) kubelet, k8s-podpool-25044293-0 Started container goldpinger
Warning Unhealthy 15m (x2 over 15m) kubelet, k8s-podpool-25044293-0 Readiness probe failed: Get "http://172.16.1.4:8080/healthz": dial tcp 172.16.1.4:8080: i/o timeout (Client.Timeout exceeded while awaiting headers)
Normal Killing 15m (x2 over 15m) kubelet, k8s-podpool-25044293-0 Container goldpinger failed liveness probe, will be restarted
Warning Unhealthy 15m (x6 over 15m) kubelet, k8s-podpool-25044293-0 Liveness probe failed: Get "http://172.16.1.4:8080/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 15m (x6 over 15m) kubelet, k8s-podpool-25044293-0 Readiness probe failed: Get "http://172.16.1.4:8080/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Normal Created 15m (x3 over 15m) kubelet, k8s-podpool-25044293-0 Created container goldpinger
Normal Pulled 15m (x3 over 15m) kubelet, k8s-podpool-25044293-0 Container image "docker.io/bloomberg/goldpinger:v3.0.0" already present on machine
Warning BackOff 65s (x59 over 14m) kubelet, k8s-podpool-25044293-0 Back-off restarting failed container
Goldpinger logs
k logs k8s-podpool-25044293-0-pod-0
{"level":"info","ts":1648504183.5840726,"caller":"goldpinger/main.go:63","msg":"Goldpinger","version":"v3.0.0","build":"Wed Apr 8 12:36:28 UTC 2020"}
{"level":"info","ts":1648504183.5996177,"caller":"goldpinger/main.go:102","msg":"Kubeconfig not specified, trying to use in cluster config"}
{"level":"info","ts":1648504183.6012673,"caller":"goldpinger/main.go:127","msg":"--ping-number set to 0: pinging all pods"}
{"level":"info","ts":1648504183.602398,"caller":"restapi/configure_goldpinger.go:132","msg":"Added the static middleware"}
{"level":"info","ts":1648504183.60244,"caller":"restapi/configure_goldpinger.go:149","msg":"Added the prometheus middleware"}
Goldpinger pod yaml
k get po -oyaml k8s-podpool-25044293-0-pod-0
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/psp: privileged
creationTimestamp: "2022-03-28T20:44:30Z"
labels:
app: goldpinger
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:labels:
.: {}
f:app: {}
f:spec:
f:affinity:
.: {}
f:podAntiAffinity:
.: {}
f:preferredDuringSchedulingIgnoredDuringExecution: {}
f:containers:
k:{"name":"goldpinger"}:
.: {}
f:env:
.: {}
k:{"name":"HOST"}:
.: {}
f:name: {}
f:value: {}
k:{"name":"HOSTNAME"}:
.: {}
f:name: {}
f:valueFrom:
.: {}
f:fieldRef: {}
k:{"name":"HOSTS_TO_RESOLVE"}:
.: {}
f:name: {}
f:value: {}
k:{"name":"POD_IP"}:
.: {}
f:name: {}
f:valueFrom:
.: {}
f:fieldRef: {}
k:{"name":"PORT"}:
.: {}
f:name: {}
f:value: {}
f:image: {}
f:imagePullPolicy: {}
f:livenessProbe:
.: {}
f:failureThreshold: {}
f:httpGet:
.: {}
f:path: {}
f:port: {}
f:scheme: {}
f:initialDelaySeconds: {}
f:periodSeconds: {}
f:successThreshold: {}
f:timeoutSeconds: {}
f:name: {}
f:ports:
.: {}
k:{"containerPort":8080,"protocol":"TCP"}:
.: {}
f:containerPort: {}
f:name: {}
f:protocol: {}
f:readinessProbe:
.: {}
f:failureThreshold: {}
f:httpGet:
.: {}
f:path: {}
f:port: {}
f:scheme: {}
f:initialDelaySeconds: {}
f:periodSeconds: {}
f:successThreshold: {}
f:timeoutSeconds: {}
f:resources:
.: {}
f:limits:
.: {}
f:memory: {}
f:requests:
.: {}
f:cpu: {}
f:memory: {}
f:securityContext:
.: {}
f:allowPrivilegeEscalation: {}
f:readOnlyRootFilesystem: {}
f:terminationMessagePath: {}
f:terminationMessagePolicy: {}
f:dnsPolicy: {}
f:enableServiceLinks: {}
f:nodeSelector: {}
f:restartPolicy: {}
f:schedulerName: {}
f:securityContext:
.: {}
f:fsGroup: {}
f:runAsNonRoot: {}
f:runAsUser: {}
f:serviceAccount: {}
f:serviceAccountName: {}
f:terminationGracePeriodSeconds: {}
manager: __debug_bin
operation: Update
time: "2022-03-28T20:44:30Z"
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:status:
f:conditions:
k:{"type":"ContainersReady"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
k:{"type":"Initialized"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:status: {}
f:type: {}
k:{"type":"Ready"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
f:containerStatuses: {}
f:hostIP: {}
f:phase: {}
f:podIP: {}
f:podIPs:
.: {}
k:{"ip":"172.16.1.4"}:
.: {}
f:ip: {}
f:startTime: {}
manager: kubelet
operation: Update
subresource: status
time: "2022-03-28T20:44:44Z"
name: k8s-podpool-25044293-0-pod-0
namespace: default
resourceVersion: "910397"
uid: afd422d4-b742-48c9-887e-0321c421dc78
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- goldpinger
topologyKey: kubernetes.io/hostname
weight: 100
containers:
- env:
- name: HOST
value: 0.0.0.0
- name: PORT
value: "8080"
- name: HOSTNAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: HOSTS_TO_RESOLVE
value: 1.1.1.1 8.8.8.8 www.bing.com
image: docker.io/bloomberg/goldpinger:v3.0.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
name: goldpinger
ports:
- containerPort: 8080
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
memory: 80Mi
requests:
cpu: 1m
memory: 40Mi
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-cd5nc
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: k8s-podpool-25044293-0
nodeSelector:
dncnode: k8s-podpool-25044293-0
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccount: goldpinger-serviceaccount
serviceAccountName: goldpinger-serviceaccount
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: kube-api-access-cd5nc
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2022-03-28T20:44:30Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2022-03-28T20:44:30Z"
message: 'containers with unready status: [goldpinger]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2022-03-28T20:44:30Z"
message: 'containers with unready status: [goldpinger]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2022-03-28T20:44:30Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: containerd://4e4a3883c9b0b7e472341069ae896b320bafb7c11410b03cbed8b8ecfb7660db
image: docker.io/bloomberg/goldpinger:v3.0.0
imageID: docker.io/bloomberg/goldpinger@sha256:6314bae32e0ec5e993369d6114cf4ade3eb055214ba80d0463d79435c7b2c7eb
lastState:
terminated:
containerID: containerd://4e4a3883c9b0b7e472341069ae896b320bafb7c11410b03cbed8b8ecfb7660db
exitCode: 2
finishedAt: "2022-03-28T22:01:26Z"
reason: Error
startedAt: "2022-03-28T22:01:09Z"
name: goldpinger
ready: false
restartCount: 31
started: false
state:
waiting:
message: back-off 5m0s restarting failed container=goldpinger pod=k8s-podpool-25044293-0-pod-0_default(afd422d4-b742-48c9-887e-0321c421dc78)
reason: CrashLoopBackOff
hostIP: 10.240.0.4
phase: Running
podIP: 172.16.1.4
podIPs:
- ip: 172.16.1.4
qosClass: Burstable
startTime: "2022-03-28T20:44:30Z"
Toolbox pod describe:
k describe po toolbox2
Name: toolbox2
Namespace: default
Priority: 0
Node: k8s-podpool-25044293-1/10.240.0.64
Start Time: Mon, 28 Mar 2022 14:46:31 -0700
Labels: app=toolbox
Annotations: kubernetes.io/psp: privileged
Status: Running
IP: 172.16.1.7
IPs:
IP: 172.16.1.7
Containers:
toolbox:
Container ID: containerd://257f4c29c8f8f559762ae81bd4babea0a4f6bfd06cc0125dc147d6596b60cd9f
Image: matmerr/toolbox:lite
Image ID: docker.io/matmerr/toolbox@sha256:e47201c473eb0cfdeccd381a696492ac2f8f2bae62b2d92dc8832bffa341502c
Port: <none>
Host Port: <none>
State: Running
Started: Mon, 28 Mar 2022 14:46:32 -0700
Ready: True
Restart Count: 0
Environment:
TCP_PORT: 8085
UDP_PORT: 8086
HTTP_PORT: 8087
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4tcqp (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube-api-access-4tcqp:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: dncnode=k8s-podpool-25044293-1
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> Successfully assigned default/toolbox2 to k8s-podpool-25044293-1
Normal Pulled 13m kubelet, k8s-podpool-25044293-1 Container image "matmerr/toolbox:lite" already present on machine
Normal Created 13m kubelet, k8s-podpool-25044293-1 Created container toolbox
Normal Started 13m kubelet, k8s-podpool-25044293-1 Started container toolbox
toolbox logs:
k logs toolbox2
[HTTP] Listening on 8087
[UDP] Listening on [::]:8086
[TCP] Listening on [::]:8085
[TCP] Received Connection from 172.16.1.6:49240
[TCP] Received Connection from 172.16.1.6:49616
kubectl get po -owide
azure-cni-manager-snrc2 1/1 Running 0 78m 10.240.0.64 k8s-podpool-25044293-1 <none> <none>
azure-cni-manager-zsxv6 1/1 Running 0 78m 10.240.0.4 k8s-podpool-25044293-0 <none> <none>
azure-cns-765kd 1/1 Running 0 78m 10.240.0.4 k8s-podpool-25044293-0 <none> <none>
azure-cns-twhhv 1/1 Running 0 78m 10.240.0.64 k8s-podpool-25044293-1 <none> <none>
dnc-788c667694-x2bwj 1/1 Running 0 76m 10.240.0.94 k8s-dncpool-25044293-0 <none> <none>
k8s-podpool-25044293-0-pod-0 0/1 CrashLoopBackOff 29 (4m46s ago) 76m 172.16.1.4 k8s-podpool-25044293-0 <none> <none>
k8s-podpool-25044293-1-pod-0 0/1 CrashLoopBackOff 29 (4m21s ago) 76m 172.16.1.5 k8s-podpool-25044293-1 <none> <none>
toolbox1 1/1 Running 0 65m 172.16.1.6 k8s-podpool-25044293-0 <none> <none>
toolbox2 1/1 Running 0 14m 172.16.1.7 k8s-podpool-25044293-1 <none> <none>
Trying to connecto to goldpinger readiness probe, from toolbox1 pod:
curl -v "http://172.16.1.4:8080/healthz"
* Trying 172.16.1.4:8080...
* TCP_NODELAY set
* connect to 172.16.1.4 port 8080 failed: Connection refused
* Failed to connect to 172.16.1.4 port 8080: Connection refused
* Closing connection 0
curl: (7) Failed to connect to 172.16.1.4 port 8080: Connection refused
I should mention that I am able to ping
the goldpinger pods just fine:
toolbox1:/app# ping 172.16.1.4
PING 172.16.1.4 (172.16.1.4) 56(84) bytes of data.
64 bytes from 172.16.1.4: icmp_seq=1 ttl=64 time=2.15 ms
64 bytes from 172.16.1.4: icmp_seq=2 ttl=64 time=0.820 ms
64 bytes from 172.16.1.4: icmp_seq=3 ttl=64 time=1.63 ms
This looks like a pretty standard Goldpinger yaml, including the HOST
envvar set to 0.0.0.0
, so I don't see anything obvious that should prevent the connectivity.
I noticed from your logs that that you're running with azure-cni
- so I'm wondering if perhaps you have a default deny kind of networking policy in place? Just guessing here.
Perhaps you could also try to set a healthcheck on your toolbox pod to confirm that it works.
Is there a way to exec into the goldpinger pod to look around? I'm just wondering if it's listening on 8080 as expected.
For toolbox pods, I'm able to exec into them. I can try a health check too
Okay I confirmed with an nginx pod, the nginx pod replies OK on port 80:
kubectl get po -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
azure-cni-manager-snrc2 1/1 Running 0 21h 10.240.0.64 k8s-podpool-25044293-1 <none> <none>
azure-cni-manager-zsxv6 1/1 Running 0 21h 10.240.0.4 k8s-podpool-25044293-0 <none> <none>
azure-cns-765kd 1/1 Running 0 21h 10.240.0.4 k8s-podpool-25044293-0 <none> <none>
azure-cns-twhhv 1/1 Running 0 21h 10.240.0.64 k8s-podpool-25044293-1 <none> <none>
dnc-788c667694-x2bwj 1/1 Running 0 21h 10.240.0.94 k8s-dncpool-25044293-0 <none> <none>
k8s-podpool-25044293-0-pod-0 0/1 CrashLoopBackOff 444 (31s ago) 21h 172.16.1.4 k8s-podpool-25044293-0 <none> <none>
k8s-podpool-25044293-1-pod-0 0/1 CrashLoopBackOff 442 (3m16s ago) 21h 172.16.1.5 k8s-podpool-25044293-1 <none> <none>
nginx1 1/1 Running 0 112s 172.16.1.8 k8s-podpool-25044293-0 <none> <none>
toolbox1 1/1 Running 0 20h 172.16.1.6 k8s-podpool-25044293-0 <none> <none>
toolbox2 1/1 Running 0 20h 172.16.1.7 k8s-podpool-25044293-1 <none> <none>
toolbox1:/app# curl "http://172.16.1.8:80"
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
Is there a way to exec into the goldpinger pod to look around? I'm just wondering if it's listening on 8080 as expected. For toolbox pods, I'm able to exec into them. I can try a health check too
You can exec into it, but other than the goldpigner binary, there isn't anything there to execute. If you have access to the underlying host, you can investigate the process from there.
If you remove the readiness/health probe temporarily, will you be able to reach the pods from your toolbox, like you did with the nginx? You were getting Connection refused
, but I suspect it's because Goldpigner was already in CLBO.
Hmm yeah @seeker89 now, when removing the readiness/health probe from the goldpinger pod, I am able to curl the /healthz
port o_0
Ah, it's because the goldpinger pod is on a different network than the kubelet
So kubelet (or whoever does readiness checks) can't talk to the goldpinger pod, but since the toolbox pods are on the same network as the goldpinger pods, the toolbox pods can talk to the goldpinger pods
Thanks @seeker89 !