coroot/coroot-node-agent

kube-state-metrics is missing despite being deployed and running and shows in Prometheus

yoyoraso opened this issue · 10 comments

Hi, I have main coroot deployed in one cluster and working on add other clusters to this one by adding already deployed prometheus, kube-state-metrics already deployed on them and just deploying coroot-node-agent, but I can't see kube-state-metrics and service map
image
image
so I started investgating and found this fails on the coroot-node-agent pods "failed to get container metadata for pid 16843 -> /kubepods/burstable/pod6f222fb5-3d0e-425e-899c-e5495124a057/ea64d45c2a6338bb0f9aae2f05ec4a77e323915d25ed11b19cb2504cbf2113d0: failed to interact with dockerd (%!s()) or with containerd (%!s())"

kubernetes version : v1.25.16+vmware.1
OS: Ubuntu 22.04.4 LTS
kernal : 6.5.0-21-generic
container runtime : containerd://1.6.28
coroot node agent tag : 1.18.9

@yoyoraso, we need to examine the node-agent's log. Could you please restart it, wait a minute, and then provide the entire log here?

@apetruhin, here it is
I0510 14:40:50.531724 606823 net.go:30] ephemeral-port-range: 32768-60999
I0510 14:40:50.540212 606823 cilium.go:30] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_ct4_global: no such file or directory
I0510 14:40:50.540261 606823 cilium.go:36] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_ct6_global: no such file or directory
I0510 14:40:50.540272 606823 cilium.go:43] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb4_backends_v2: no such file or directory
I0510 14:40:50.540280 606823 cilium.go:43] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb4_backends_v3: no such file or directory
I0510 14:40:50.540290 606823 cilium.go:52] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb6_backends_v2: no such file or directory
I0510 14:40:50.540300 606823 cilium.go:52] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb6_backends_v3: no such file or directory
I0510 14:40:50.540313 606823 main.go:102] agent version: 1.18.9
I0510 14:40:50.540380 606823 main.go:108] hostname: ******
I0510 14:40:50.540389 606823 main.go:109] kernel version: 6.5.0-21-generic
I0510 14:40:50.541001 606823 main.go:75] machine-id: ******
I0510 14:40:50.541035 606823 tracing.go:34] no OpenTelemetry traces collector endpoint configured
I0510 14:40:50.541048 606823 otel.go:26] no OpenTelemetry logs collector endpoint configured
I0510 14:40:50.541180 606823 metadata.go:67] cloud provider:
I0510 14:40:50.541193 606823 collector.go:157] instance metadata:
I0510 14:40:50.541282 606823 profiling.go:49] no profiles endpoint configured
W0510 14:40:50.541721 606823 registry.go:75] Cannot connect to the Docker daemon at unix:///proc/1/root/run/docker.sock. Is the docker daemon running?
W0510 14:40:50.541721 606823 registry.go:75] Cannot connect to the Docker daemon at unix:///proc/1/root/run/docker.sock. Is the docker daemon running?
W0510 14:40:54.544388 606823 registry.go:78] couldn't connect to containerd through the following UNIX sockets [/var/snap/microk8s/common/run/containerd.sock,/run/k0s/containerd.sock,/run/k3s/containerd/containerd.sock,/run/containerd/containerd.sock]: failed to dial "/proc/1/root/run/containerd/containerd.sock": context deadline exceeded
W0510 14:40:54.544388 606823 registry.go:78] couldn't connect to containerd through the following UNIX sockets [/var/snap/microk8s/common/run/containerd.sock,/run/k0s/containerd.sock,/run/k3s/containerd/containerd.sock,/run/containerd/containerd.sock]: failed to dial "/proc/1/root/run/containerd/containerd.sock": context deadline exceeded
W0510 14:40:54.544482 606823 registry.go:81] stat /proc/1/root/var/run/crio/crio.sock: no such file or directory
W0510 14:40:54.544482 606823 registry.go:81] stat /proc/1/root/var/run/crio/crio.sock: no such file or directory
I0510 14:40:54.878632 606823 registry.go:281] calculated container id 1 -> / ->
I0510 14:40:54.878729 606823 registry.go:286] "ignoring" cg="/" pid=1
I0510 14:40:54.878791 606823 registry.go:281] calculated container id 2 -> / ->
I0510 14:40:54.878805 606823 registry.go:286] "ignoring" cg="/" pid=2
I0510 14:40:54.878844 606823 registry.go:281] calculated container id 3 -> / ->
I0510 14:40:54.878856 606823 registry.go:286] "ignoring" cg="/" pid=3
I0510 14:40:54.878893 606823 registry.go:281] calculated container id 4 -> / ->
I0510 14:40:54.878901 606823 registry.go:286] "ignoring" cg="/" pid=4
I0510 14:40:54.878936 606823 registry.go:281] calculated container id 5 -> / ->
I0510 14:40:54.878947 606823 registry.go:286] "ignoring" cg="/" pid=5
I0510 14:40:54.878982 606823 registry.go:281] calculated container id 6 -> / ->
I0510 14:40:54.878994 606823 registry.go:286] "ignoring" cg="/" pid=6
I0510 14:40:54.879027 606823 registry.go:281] calculated container id 8 -> / ->
I0510 14:40:54.879038 606823 registry.go:286] "ignoring" cg="/" pid=8
I0510 14:40:54.879073 606823 registry.go:281] calculated container id 11 -> / ->
I0510 14:40:54.879081 606823 registry.go:286] "ignoring" cg="/" pid=11
I0510 14:40:54.879113 606823 registry.go:281] calculated container id 12 -> / ->
I0510 14:40:54.879121 606823 registry.go:286] "ignoring" cg="/" pid=12
I0510 14:40:54.879153 606823 registry.go:281] calculated container id 13 -> / ->
I0510 14:40:54.879166 606823 registry.go:286] "ignoring" cg="/" pid=13
I0510 14:40:54.879200 606823 registry.go:281] calculated container id 14 -> / ->
I0510 14:40:54.879211 606823 registry.go:286] "ignoring" cg="/" pid=14
I0510 14:40:54.879244 606823 registry.go:281] calculated container id 15 -> / ->
I0510 14:40:54.879251 606823 registry.go:286] "ignoring" cg="/" pid=15
I0510 14:40:54.879283 606823 registry.go:281] calculated container id 16 -> / ->
I0510 14:40:54.879291 606823 registry.go:286] "ignoring" cg="/" pid=16
I0510 14:40:54.879325 606823 registry.go:281] calculated container id 17 -> / ->
I0510 14:40:54.879332 606823 registry.go:286] "ignoring" cg="/" pid=17
I0510 14:40:54.879366 606823 registry.go:281] calculated container id 18 -> / ->
I0510 14:40:54.879377 606823 registry.go:286] "ignoring" cg="/" pid=18
I0510 14:40:54.879410 606823 registry.go:281] calculated container id 19 -> / ->
I0510 14:40:54.879419 606823 registry.go:286] "ignoring" cg="/" pid=19
I0510 14:40:54.879452 606823 registry.go:281] calculated container id 20 -> / ->
I0510 14:40:54.879466 606823 registry.go:286] "ignoring" cg="/" pid=20
I0510 14:40:54.879500 606823 registry.go:281] calculated container id 21 -> / ->
I0510 14:40:54.879511 606823 registry.go:286] "ignoring" cg="/" pid=21
I0510 14:40:54.879544 606823 registry.go:281] calculated container id 22 -> / ->
I0510 14:40:54.879556 606823 registry.go:286] "ignoring" cg="/" pid=22
I0510 14:40:54.879588 606823 registry.go:281] calculated container id 23 -> / ->
I0510 14:40:54.879600 606823 registry.go:286] "ignoring" cg="/" pid=23
I0510 14:40:54.879633 606823 registry.go:281] calculated container id 25 -> / ->
I0510 14:40:54.879640 606823 registry.go:286] "ignoring" cg="/" pid=25
I0510 14:40:54.879674 606823 registry.go:281] calculated container id 26 -> / ->
I0510 14:40:54.879685 606823 registry.go:286] "ignoring" cg="/" pid=26
I0510 14:40:54.879718 606823 registry.go:281] calculated container id 27 -> / ->
I0510 14:40:54.879729 606823 registry.go:286] "ignoring" cg="/" pid=27
I0510 14:40:54.879768 606823 registry.go:281] calculated container id 28 -> / ->
I0510 14:40:54.879781 606823 registry.go:286] "ignoring" cg="/" pid=28
I0510 14:40:54.879816 606823 registry.go:281] calculated container id 29 -> / ->
I0510 14:40:54.879823 606823 registry.go:286] "ignoring" cg="/" pid=29
I0510 14:40:54.879858 606823 registry.go:281] calculated container id 31 -> / ->
I0510 14:40:54.879865 606823 registry.go:286] "ignoring" cg="/" pid=31
I0510 14:40:54.879897 606823 registry.go:281] calculated container id 32 -> / ->
I0510 14:40:54.879904 606823 registry.go:286] "ignoring" cg="/" pid=32
I0510 14:40:54.879936 606823 registry.go:281] calculated container id 33 -> / ->
I0510 14:40:54.879949 606823 registry.go:286] "ignoring" cg="/" pid=33
I0510 14:40:54.879985 606823 registry.go:281] calculated container id 34 -> / ->
I0510 14:40:54.879992 606823 registry.go:286] "ignoring" cg="/" pid=34
I0510 14:40:54.880055 606823 registry.go:281] calculated container id 35 -> / ->
I0510 14:40:54.880063 606823 registry.go:286] "ignoring" cg="/" pid=35
I0510 14:40:54.880098 606823 registry.go:281] calculated container id 37 -> / ->
I0510 14:40:54.880107 606823 registry.go:286] "ignoring" cg="/" pid=37
I0510 14:40:54.880140 606823 registry.go:281] calculated container id 38 -> / ->
I0510 14:40:54.880154 606823 registry.go:286] "ignoring" cg="/" pid=38
I0510 14:40:54.880189 606823 registry.go:281] calculated container id 39 -> / ->
I0510 14:40:54.880202 606823 registry.go:286] "ignoring" cg="/" pid=39
W0510 14:40:54.880228 606823 init.go:35] open /proc/1/net/tcp6: no such file or directory
W0510 14:40:54.880228 606823 init.go:35] open /proc/1/net/tcp6: no such file or directory
I0510 14:40:54.880236 606823 registry.go:281] calculated container id 40 -> / ->
I0510 14:40:54.880290 606823 registry.go:286] "ignoring" cg="/" pid=40
I0510 14:40:54.880340 606823 registry.go:281] calculated container id 41 -> / ->
I0510 14:40:54.880353 606823 registry.go:286] "ignoring" cg="/" pid=41
I0510 14:40:54.880391 606823 registry.go:281] calculated container id 43 -> / ->
I0510 14:40:54.880399 606823 registry.go:286] "ignoring" cg="/" pid=43
I0510 14:40:54.880433 606823 registry.go:281] calculated container id 44 -> / ->
I0510 14:40:54.880439 606823 registry.go:286] "ignoring" cg="/" pid=44
I0510 14:40:54.880472 606823 registry.go:281] calculated container id 45 -> / ->
I0510 14:40:54.880480 606823 registry.go:286] "ignoring" cg="/" pid=45
I0510 14:40:54.880511 606823 registry.go:281] calculated container id 46 -> / ->
I0510 14:40:54.880518 606823 registry.go:286] "ignoring" cg="/" pid=46
I0510 14:40:54.880549 606823 registry.go:281] calculated container id 47 -> / ->
I0510 14:40:54.880556 606823 registry.go:286] "ignoring" cg="/" pid=47
I0510 14:40:54.880591 606823 registry.go:281] calculated container id 50 -> / ->
I0510 14:40:54.880598 606823 registry.go:286] "ignoring" cg="/" pid=50
I0510 14:40:54.880631 606823 registry.go:281] calculated container id 51 -> / ->
I0510 14:40:54.880638 606823 registry.go:286] "ignoring" cg="/" pid=51
I0510 14:40:54.880671 606823 registry.go:281] calculated container id 52 -> / ->
I0510 14:40:54.880678 606823 registry.go:286] "ignoring" cg="/" pid=52
I0510 14:40:54.880711 606823 registry.go:281] calculated container id 53 -> / ->
I0510 14:40:54.880718 606823 registry.go:286] "ignoring" cg="/" pid=53
I0510 14:40:54.880750 606823 registry.go:281] calculated container id 55 -> / ->
I0510 14:40:54.880757 606823 registry.go:286] "ignoring" cg="/" pid=55
I0510 14:40:54.880790 606823 registry.go:281] calculated container id 56 -> / ->
I0510 14:40:54.880797 606823 registry.go:286] "ignoring" cg="/" pid=56
I0510 14:40:54.880835 606823 registry.go:281] calculated container id 57 -> / ->
I0510 14:40:54.880843 606823 registry.go:286] "ignoring" cg="/" pid=57
I0510 14:40:54.880877 606823 registry.go:281] calculated container id 58 -> / ->
I0510 14:40:54.880884 606823 registry.go:286] "ignoring" cg="/" pid=58
I0510 14:40:54.880918 606823 registry.go:281] calculated container id 59 -> / ->
I0510 14:40:54.969239 606823 registry.go:213] TCP connection from unknown container {connection-open none 9196 11.0.101.3:33262 11.33.38.9:9093 34 622082154767560 }
W0510 14:40:55.888703 606823 registry.go:277] failed to get container metadata for pid 14343 -> /kubepods/besteffort/poda7143171-67b8-4c99-b7b5-3b850b41d2e5/65e74674bb94880aa9b1b8d913b1f37d6ac36613be45fb3e2ca13837db45c1fb: failed to interact with dockerd (%!s()) or with containerd (%!s())
W0510 14:40:55.888703 606823 registry.go:277] failed to get container metadata for pid 14343 -> /kubepods/besteffort/poda7143171-67b8-4c99-b7b5-3b850b41d2e5/65e74674bb94880aa9b1b8d913b1f37d6ac36613be45fb3e2ca13837db45c1fb: failed to interact with dockerd (%!s()) or with containerd (%!s())
I0510 14:40:55.888742 606823 registry.go:213] TCP connection from unknown container {connection-open none 14343 11.32.115.7:51738 10.100.192.1:443 111 622083074116078 }
W0510 14:40:55.929900 606823 registry.go:277] failed to get container metadata for pid 20694 -> /kubepods/burstable/pod671ca5e2-5ce3-46a0-b10f-f5e4f8098e33/4bda181fc1ca52ebe65399cb8e11649c0e133cd6a85f959f0f6a3d370478f2cb: failed to interact with dockerd (%!s()) or with containerd (%!s())
W0510 14:40:55.929900 606823 registry.go:277] failed to get container metadata for pid 20694 -> /kubepods/burstable/pod671ca5e2-5ce3-46a0-b10f-f5e4f8098e33/4bda181fc1ca52ebe65399cb8e11649c0e133cd6a85f959f0f6a3d370478f2cb: failed to interact with dockerd (%!s()) or with containerd (%!s())
I0510 14:40:55.929929 606823 registry.go:213] TCP connection from unknown container {connection-open none 20694 127.0.0.1:44250 127.0.0.1:8080 14 622083115307215 }
W0510 14:40:55.946816 606823 registry.go:277] failed to get container metadata for pid 14343 -> /kubepods/besteffort/poda7143171-67b8-4c99-b7b5-3b850b41d2e5/65e74674bb94880aa9b1b8d913b1f37d6ac36613be45fb3e2ca13837db45c1fb: failed to interact with dockerd (%!s()) or with containerd (%!s())
W0510 14:40:55.946816 606823 registry.go:277] failed to get container metadata for pid 14343 -> /kubepods/besteffort/poda7143171-67b8-4c99-b7b5-3b850b41d2e5/65e74674bb94880aa9b1b8d913b1f37d6ac36613be45fb3e2ca13837db45c1fb: failed to interact with dockerd (%!s()) or with containerd (%!s())
I0510 14:40:55.946857 606823 registry.go:213] TCP connection from unknown container {connection-open none 14343 11.32.115.7:51752 10.100.192.1:443 107 622083132268869 }
W0510 14:40:55.946943 606823 registry.go:277] failed to get container metadata for pid 14343 -> /kubepods/besteffort/poda7143171-67b8-4c99-b7b5-3b850b41d2e5/65e74674bb94880aa9b1b8d913b1f37d6ac36613be45fb3e2ca13837db45c1fb: failed to interact with dockerd (%!s()) or with containerd (%!s())

Could you please ssh to the node and check for containerd.sock:

# ls -l /run/containerd/containerd.sock
srw-rw---- 1 root root 0 Jan  4 09:04 /run/containerd/containerd.sock

@apetruhin I can't access the cluster nodes sadly :(

The agent failed to locate containerd.sock.

Please exec into the node-agent pod and try to find the containerd.sock file:

kubectl -n coroot exec -ti coroot-node-agent-dwwrf -- bash

root@coroot-node-agent-dwwrf:/# ls -l /proc/1/root/run/containerd/containerd.sock

The root filesystem should be accessible from a node-agent pod under /proc/1/root/.

@apetruhin
root@node-agent-ntkdw:/# ls -l /proc/1/root/run/containerd/containerd.sock
lrwxrwxrwx 1 root root 44 May 3 10:14 /proc/1/root/run/containerd/containerd.sock -> /var/vcap/sys/run/containerd/containerd.sock

@yoyoraso, could you please verify whether /proc/1/root/var/vcap/sys/run/containerd/containerd.sock is not symlink to another location?

root@node-agent-ntkdw:/# ls -l /proc/1/root/var/vcap/sys/run/containerd/containerd.sock

Hi @apetruhin root@node-agent-ntkdw:/# ls -l /proc/1/root/var/vcap/sys/run/containerd/containerd.sock
ls: cannot access '/proc/1/root/var/vcap/sys/run/containerd/containerd.sock': No such file or directory

@yoyoraso, please provide details about your setup and instructions on how to run this type of Kubernetes environment to reproduce the issue.

@apetruhin it is a basic k8s cluster made using vm tanzu
kubernetes version : v1.25.16+vmware.1
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.16+vmware.1", GitCommit:"84fd181a4243c4354b9208f4292f1b6cd82726b1", GitTreeState:"clean", BuildDate:"2023-11-21T10:59:59Z", GoVersion:"go1.20.10", Compiler:"gc", Platform:"linux/amd64"}
OS: Ubuntu 22.04.4 LTS
kernal : 6.5.0-21-generic
container runtime : containerd://1.6.28
coroot node agent tag : 1.18.9

let me know if you needed more information