coroot/coroot-node-agent

Only containerD is shown in application

sumeet-zuora opened this issue · 20 comments

As per documents and installation, after installing coroot and agent, prometheus was attached properly and only visible application was containerD any help appreciated

@Schaudhari7565, please attach logs of the agent

corootnodeagent-n4cbb.txt
Attached is the logs from one of the agent

manifest for agent

---
# Source: corootnodeagent/templates/daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  namespace: coroot
  labels:
    chart: "corootnodeagent-1.0.0"
    release: "corootnodeagent"
    heritage: "Helm"
  name: corootnodeagent
spec:
  selector:
    matchLabels:
      app: corootnodeagent
      group: observability
      provider: tools
  template:
    metadata:
      annotations:
        prometheus.io/port: "80"
        prometheus.io/scrape: "true"
      labels:
        app: corootnodeagent
        group: observability
        provider: tools
    spec:
      imagePullSecrets:
        - name: regcred
      tolerations:
        - operator: Exists
      hostPID: true
      containers:
        - name: corootnodeagent
          image: "ghcr.io/coroot/coroot-node-agent:latest"
          imagePullPolicy: "IfNotPresent"
          args: ["--cgroupfs-root", "/host/sys/fs/cgroup"]
          ports:
            - name: http
              containerPort: 80
          securityContext:
            privileged: true
          volumeMounts:
            - mountPath: /host/sys/fs/cgroup
              name: cgroupfs
              readOnly: true
            - mountPath: /sys/kernel/debug
              name: debugfs
              readOnly: false
      volumes:
        - hostPath:
            path: /sys/fs/cgroup
          name: cgroupfs
        - hostPath:
            path: /sys/kernel/debug
          name: debugfs

also, I am using VictoriaMetrics instead of Prometheus .. not sure if this breaks but connection did work as expected

At first glance, nothing unusual.
Please show me how it looks in Coroot: main page and settings page of the project.

Also, Coroot logs would help.

ahh.. I was missing the kube-state-metrics, seems like a progress no more logs other than compaction .. does it take some time? for UI to show up services

W0929 19:03:37.575715       1 containers.go:65] unknown pod: kube-system/cilium-bs7md, seems like no kube-state-metrics installed
W0929 19:03:37.575736       1 containers.go:65] unknown pod: coroot/corootnodeagent-cwk82, seems like no kube-state-metrics installed
W0929 19:03:37.576552       1 containers.go:65] unknown pod: pomerium/pomerium-proxy-587b77dd7c-zj899, seems like no kube-state-metrics installed
W0929 19:03:37.576582       1 containers.go:65] unknown pod: pomerium/pomerium-authenticate-6f5c68ff6b-p4vzb, seems like no kube-state-metrics installed
W0929 19:03:37.576603       1 containers.go:65] unknown pod: vertical-pod-autoscaler-ecc/vertical-pod-autoscaler-updater-f6c6c88d6-tq648, seems like no kube-state-metrics installed
W0929 19:03:37.577216       1 containers.go:65] unknown pod: kong-internal/kong-kong-internal-948b64c4b-26zzp, seems like no kube-state-metrics installed
W0929 19:03:37.577245       1 containers.go:65] unknown pod: vertical-pod-autoscaler/vertical-pod-autoscaler-recommender-577b8847df-nc84r, seems like no kube-state-metrics installed
W0929 19:03:37.577644       1 containers.go:65] unknown pod: kube-system/cilium-j8stt, seems like no kube-state-metrics installed
W0929 19:03:37.577675       1 containers.go:65] unknown pod: logging/elasticsearch-es-client-1, seems like no kube-state-metrics installed
W0929 19:03:37.577696       1 containers.go:65] unknown pod: zodiac/zookeeper-0, seems like no kube-state-metrics installed
W0929 19:03:37.578267       1 containers.go:65] unknown pod: logging/elasticsearch-es-client-1, seems like no kube-state-metrics installed
W0929 19:03:37.578308       1 containers.go:65] unknown pod: kube-system/cilium-ldzf5, seems like no kube-state-metrics installed
W0929 19:03:37.578330       1 containers.go:65] unknown pod: zodiac/elastic-master-2, seems like no kube-state-metrics installed
W0929 19:03:37.578518       1 containers.go:65] unknown pod: kube-system/cilium-bs7md, seems like no kube-state-metrics installed
I0929 19:03:37.584400       1 constructor.go:64] got 13 nodes, 1500 services, 1390 applications
I0929 19:03:39.063600       1 compaction.go:92] compaction iteration started
I0929 19:03:49.064250       1 compaction.go:92] compaction iteration started
I0929 19:03:57.011050       1 updater.go:53] worker iteration for 2tt6kt9l
I0929 19:03:59.158375       1 compaction.go:92] compaction iteration started
I0929 19:04:09.064052       1 compaction.go:92] compaction iteration started
I0929 19:04:19.064213       1 compaction.go:92] compaction iteration started

Still same, after almost 15 minutes .. only containerD is visible

image

It can take some time (depending on the cluster size) to cache-updater download metrics of the kube-state-metrics for the first time.
Do you have more lines like this in Coroot logs?

I0929 19:03:57.011050       1 updater.go:53] worker iteration for 2tt6kt9l

Or maybe some errors?

Still nothing, no errors during startup .. only messages i see are

I0930 07:03:55.449972       1 main.go:29] version: 0.4.0
I0930 07:03:55.450088       1 db.go:39] using sqlite database
I0930 07:03:55.795158       1 cache.go:130] cache loaded from disk in 339.678568ms
I0930 07:03:55.795491       1 compaction.go:81] compaction worker started
I0930 07:03:55.795534       1 main.go:77] listening on 0.0.0.0:8080
I0930 07:03:56.796094       1 updater.go:53] worker iteration for 2tt6kt9l
I0930 07:04:05.795815       1 compaction.go:92] compaction iteration started
I0930 08:15:05.809959       1 compaction.go:155] compaction task 3c4b3c56d9bf3ed9c6fb8ca80b6e51d3 [1664511240,1664514840,1664518440,1664522040]:3600 -> 1664511240:14400 done in 12.387516ms
I0930 08:15:05.811276       1 compaction.go:144] deleting chunk after compaction: /data/cache/2tt6kt9l/2tt6kt9l-ad52fcad143b8b1451800115bbe853fe-1664511240-120-30.db
I0930 08:15:05.811322       1 compaction.go:144] deleting chunk after compaction: /data/cache/2tt6kt9l/2tt6kt9l-ad52fcad143b8b1451800115bbe853fe-1664514840-120-30.db
I0930 08:15:05.811344       1 compaction.go:144] deleting chunk after compaction: /data/cache/2tt6kt9l/2tt6kt9l-ad52fcad143b8b1451800115bbe853fe-1664518440-120-30.db
I0930 08:15:05.811370       1 compaction.go:144] deleting chunk after compaction: /data/cache/2tt6kt9l/2tt6kt9l-ad52fcad143b8b1451800115bbe853fe-1664522040-120-30.db
I0930 08:15:05.811410       1 compaction.go:155] compaction task ad52fcad143b8b1451800115bbe853fe [1664511240,1664514840,1664518440,1664522040]:3600 -> 1664511240:14400 done in 1.410773ms
I0930 08:15:15.795574       1 compaction.go:92] compaction iteration started
I0930 08:15:25.796272       1 compaction.go:92] compaction iteration started
I0930 08:15:26.200839       1 updater.go:53] worker iteration for 2tt6kt9l

  • Can you show a screenshot of the settings page (/p/2tt6kt9l/settings)?
  • Execute the kube_pod_info query in your VictoriaMetrics and show the output.

image

so, I did found that, I was not scraping metrics of kube-state-metrics from where the coroot cluster was running, but with adding annotations I got the metrics
image

kube_pod_info{app_kubernetes_io_component="metrics", app_kubernetes_io_instance="kube-state-metrics", app_kubernetes_io_managed_by="Helm", app_kubernetes_io_name="kube-state-metrics", app_kubernetes_io_part_of="kube-state-metrics", app_kubernetes_io_version="2.6.0", container="kube-state-metrics", created_by_kind="DaemonSet", created_by_name="aws-node-termination-handler", datacenter="eks-12-ecc-xxxx-xxxxx", exported_namespace="aws-node-termination-handler", exported_node="ip-10-124-128-97.us-west-2.compute.internal", exported_pod="aws-node-termination-handler-4tzqm", helm_sh_chart="kube-state-metrics-4.20.1", host_ip="10.124.128.97", host_network="true", instance="10.8.30.247:8080", job="1", namespace="monitoring", node="ip-10-124-130-55.us-west-2.compute.internal", pod="kube-state-metrics-c6678766c-cbprt", pod_ip="10.124.128.97", pod_template_hash="c6678766c", priority_class="system-node-critical", uid="1385a8d7-9a21-4674-8dfa-b0cb50fe6b54"}

Something new showed up and it keeps on changing, different applications are show automatically under monitoring

image

does it take time to build cache or something?

Coroot uses metrics gathered by kube-state-metrics to join containers into applications. So, this probably should fix the issue.

So, after adding the annotations and can see the metrics in VM.. still it complains about some pods missing and suddenly it detects them.. seems like it is loosing connections

W0930 18:55:57.155919       1 containers.go:65] unknown pod: logging/elasticsearch-es-client-1, seems like no kube-state-metrics installed
W0930 18:55:57.155951       1 containers.go:65] unknown pod: keda/keda-operator-675b587d7b-xcls7, seems like no kube-state-metrics installed
W0930 18:55:57.156000       1 containers.go:65] unknown pod: kube-system/cilium-7lfxm, seems like no kube-state-metrics installed
W0930 18:55:57.156039       1 containers.go:65] unknown pod: kube-system/cilium-ns58n, seems like no kube-state-metrics installed
W0930 18:55:57.156068       1 containers.go:65] unknown pod: logging/elasticsearch-es-warm-0, seems like no kube-state-metrics installed
W0930 18:55:57.156106       1 containers.go:65] unknown pod: kube-system/cilium-q5gqr, seems like no kube-state-metrics installed
W0930 18:55:57.156140       1 containers.go:65] unknown pod: kube-system/cilium-c6kzq, seems like no kube-state-metrics installed
W0930 18:55:57.156176       1 containers.go:65] unknown pod: kube-system/cilium-operator-69c65bf5c6-mrz6b, seems like no kube-state-metrics installed
W0930 18:55:57.156257       1 containers.go:65] unknown pod: elastic-operator/elastic-operator-1, seems like no kube-state-metrics installed
W0930 18:55:57.156292       1 containers.go:65] unknown pod: kube-system/cilium-c6kzq, seems like no kube-state-metrics installed
W0930 18:55:57.156336       1 containers.go:65] unknown pod: kube-system/cilium-c6kzq, seems like no kube-state-metrics installed
W0930 18:55:57.156374       1 containers.go:65] unknown pod: kube-system/kube-proxy-dxkzf, seems like no kube-state-metrics installed
W0930 18:55:57.156413       1 containers.go:65] unknown pod: kube-system/cilium-68x2b, seems like no kube-state-metrics installed
I0930 18:55:57.163314       1 constructor.go:64] got 18 nodes, 1656 services, 1450 applications
2022/09/30 18:55:57 http: panic serving 127.0.0.1:50250: runtime error: invalid memory address or nil pointer dereference
goroutine 5986 [running]:
net/http.(*conn).serve.func1()
	/usr/local/go/src/net/http/server.go:1825 +0xbf
panic({0xac2f00, 0x1203280})
	/usr/local/go/src/runtime/panic.go:844 +0x258
github.com/coroot/coroot/api/views/overview.Render(0xc00029cbd0)
	/go/src/api/views/overview/overview.go:107 +0xb07
github.com/coroot/coroot/api/views.Overview(...)
	/go/src/api/views/views.go:20
github.com/coroot/coroot/api.(*Api).Overview(0xa92fa0?, {0xc5ccf0, 0xc006c16380}, 0xc000241e00?)
	/go/src/api/api.go:193 +0x91
net/http.HandlerFunc.ServeHTTP(0xc006c0d000?, {0xc5ccf0?, 0xc006c16380?}, 0x0?)
	/usr/local/go/src/net/http/server.go:2084 +0x2f
github.com/gorilla/mux.(*Router).ServeHTTP(0xc0001c2240, {0xc5ccf0, 0xc006c16380}, 0xc006c0cc00)
	/go/pkg/mod/github.com/gorilla/mux@v1.8.0/mux.go:210 +0x1cf
net/http.serverHandler.ServeHTTP({0xc000775860?}, {0xc5ccf0, 0xc006c16380}, 0xc006c0cc00)
	/usr/local/go/src/net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc000fbe460, {0xc5d398, 0xc0003dcd80})
	/usr/local/go/src/net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
	/usr/local/go/src/net/http/server.go:3071 +0x4db
I0930 18:55:57.325649       1 compaction.go:92] compaction iteration started
I0930 18:55:58.325732       1 updater.go:53] worker iteration for 2tt6kt9l

in drop down I can see applications
image

but after selecting nothing is there
image

also UI is flaky.. the applications keep on changing

@Schaudhari7565, apologies for the delayed response. We have fixed the panic. Please update Coroot.

I did update to latest 0.5.0 and still got panic is this due to large number of applications? wanted to know if we can restrict the applications or filter based on some labels .. like datacenter=eks16 to avoid reading all the metrics at same time

I1011 17:28:51.890568       1 constructor.go:68] got 46 nodes, 1557 services, 1484 applications
2022/10/11 17:28:52 http: panic serving 127.0.0.1:56478: runtime error: invalid memory address or nil pointer dereference
goroutine 20983 [running]:
net/http.(*conn).serve.func1()
	/usr/local/go/src/net/http/server.go:1825 +0xbf
panic({0xaf80e0, 0x1260280})
	/usr/local/go/src/runtime/panic.go:844 +0x258
github.com/coroot/coroot/auditor.(*appAuditor).cpu(0xc01207ebb8)
	/go/src/auditor/cpu.go:39 +0x4a2
github.com/coroot/coroot/auditor.Audit(0xc079e1e000)
	/go/src/auditor/auditor.go:26 +0x10a
github.com/coroot/coroot/api/views/overview.Render(0xc079e1e000)
	/go/src/api/views/overview/overview.go:40 +0x9d
github.com/coroot/coroot/api/views.Overview(...)
	/go/src/api/views/views.go:20
github.com/coroot/coroot/api.(*Api).Overview(0xc079e00120?, {0xc9f470, 0xc079e0c000}, 0xc0002c3680?)
	/go/src/api/api.go:194 +0x91
net/http.HandlerFunc.ServeHTTP(0xc079e1a000?, {0xc9f470?, 0xc079e0c000?}, 0xc0c526a9c0?)
	/usr/local/go/src/net/http/server.go:2084 +0x2f
github.com/gorilla/mux.(*Router).ServeHTTP(0xc000242000, {0xc9f470, 0xc079e0c000}, 0xc06c530000)
	/go/pkg/mod/github.com/gorilla/mux@v1.8.0/mux.go:210 +0x1cf
net/http.serverHandler.ServeHTTP({0xc06c512ea0?}, {0xc9f470, 0xc079e0c000}, 0xc06c530000)
	/usr/local/go/src/net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc06c528000, {0xc9fb18, 0xc00013d9b0})
	/usr/local/go/src/net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
	/usr/local/go/src/net/http/server.go:3071 +0x4db
I1011 17:28:55.345169       1 compaction.go:92] compaction iteration started
I1011 17:29:05.344864       1 compaction.go:92] compaction iteration started
I1011 17:29:15.345428       1 compaction.go:92] compaction iteration started
I1011 17:29:25.345135       1 compaction.go:92] compaction iteration started
^C

It is a new bug. We will fix it soon. Meanwhile, please install version 0.4.1

Scaled down to 0.4.1, will monitor the logs

@Schaudhari7565, we've fixed the panic bug. Please upgrade Coroot to version >=0.5.1