Only containerD is shown in application
sumeet-zuora opened this issue · 20 comments
As per documents and installation, after installing coroot and agent, prometheus was attached properly and only visible application was containerD any help appreciated
@Schaudhari7565, please attach logs of the agent
corootnodeagent-n4cbb.txt
Attached is the logs from one of the agent
manifest for agent
---
# Source: corootnodeagent/templates/daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
namespace: coroot
labels:
chart: "corootnodeagent-1.0.0"
release: "corootnodeagent"
heritage: "Helm"
name: corootnodeagent
spec:
selector:
matchLabels:
app: corootnodeagent
group: observability
provider: tools
template:
metadata:
annotations:
prometheus.io/port: "80"
prometheus.io/scrape: "true"
labels:
app: corootnodeagent
group: observability
provider: tools
spec:
imagePullSecrets:
- name: regcred
tolerations:
- operator: Exists
hostPID: true
containers:
- name: corootnodeagent
image: "ghcr.io/coroot/coroot-node-agent:latest"
imagePullPolicy: "IfNotPresent"
args: ["--cgroupfs-root", "/host/sys/fs/cgroup"]
ports:
- name: http
containerPort: 80
securityContext:
privileged: true
volumeMounts:
- mountPath: /host/sys/fs/cgroup
name: cgroupfs
readOnly: true
- mountPath: /sys/kernel/debug
name: debugfs
readOnly: false
volumes:
- hostPath:
path: /sys/fs/cgroup
name: cgroupfs
- hostPath:
path: /sys/kernel/debug
name: debugfs
also, I am using VictoriaMetrics instead of Prometheus .. not sure if this breaks but connection did work as expected
At first glance, nothing unusual.
Please show me how it looks in Coroot: main page and settings page of the project.
Also, Coroot logs would help.
ahh.. I was missing the kube-state-metrics, seems like a progress no more logs other than compaction .. does it take some time? for UI to show up services
W0929 19:03:37.575715 1 containers.go:65] unknown pod: kube-system/cilium-bs7md, seems like no kube-state-metrics installed
W0929 19:03:37.575736 1 containers.go:65] unknown pod: coroot/corootnodeagent-cwk82, seems like no kube-state-metrics installed
W0929 19:03:37.576552 1 containers.go:65] unknown pod: pomerium/pomerium-proxy-587b77dd7c-zj899, seems like no kube-state-metrics installed
W0929 19:03:37.576582 1 containers.go:65] unknown pod: pomerium/pomerium-authenticate-6f5c68ff6b-p4vzb, seems like no kube-state-metrics installed
W0929 19:03:37.576603 1 containers.go:65] unknown pod: vertical-pod-autoscaler-ecc/vertical-pod-autoscaler-updater-f6c6c88d6-tq648, seems like no kube-state-metrics installed
W0929 19:03:37.577216 1 containers.go:65] unknown pod: kong-internal/kong-kong-internal-948b64c4b-26zzp, seems like no kube-state-metrics installed
W0929 19:03:37.577245 1 containers.go:65] unknown pod: vertical-pod-autoscaler/vertical-pod-autoscaler-recommender-577b8847df-nc84r, seems like no kube-state-metrics installed
W0929 19:03:37.577644 1 containers.go:65] unknown pod: kube-system/cilium-j8stt, seems like no kube-state-metrics installed
W0929 19:03:37.577675 1 containers.go:65] unknown pod: logging/elasticsearch-es-client-1, seems like no kube-state-metrics installed
W0929 19:03:37.577696 1 containers.go:65] unknown pod: zodiac/zookeeper-0, seems like no kube-state-metrics installed
W0929 19:03:37.578267 1 containers.go:65] unknown pod: logging/elasticsearch-es-client-1, seems like no kube-state-metrics installed
W0929 19:03:37.578308 1 containers.go:65] unknown pod: kube-system/cilium-ldzf5, seems like no kube-state-metrics installed
W0929 19:03:37.578330 1 containers.go:65] unknown pod: zodiac/elastic-master-2, seems like no kube-state-metrics installed
W0929 19:03:37.578518 1 containers.go:65] unknown pod: kube-system/cilium-bs7md, seems like no kube-state-metrics installed
I0929 19:03:37.584400 1 constructor.go:64] got 13 nodes, 1500 services, 1390 applications
I0929 19:03:39.063600 1 compaction.go:92] compaction iteration started
I0929 19:03:49.064250 1 compaction.go:92] compaction iteration started
I0929 19:03:57.011050 1 updater.go:53] worker iteration for 2tt6kt9l
I0929 19:03:59.158375 1 compaction.go:92] compaction iteration started
I0929 19:04:09.064052 1 compaction.go:92] compaction iteration started
I0929 19:04:19.064213 1 compaction.go:92] compaction iteration started
It can take some time (depending on the cluster size) to cache-updater download metrics of the kube-state-metrics
for the first time.
Do you have more lines like this in Coroot logs?
I0929 19:03:57.011050 1 updater.go:53] worker iteration for 2tt6kt9l
Or maybe some errors?
Still nothing, no errors during startup .. only messages i see are
I0930 07:03:55.449972 1 main.go:29] version: 0.4.0
I0930 07:03:55.450088 1 db.go:39] using sqlite database
I0930 07:03:55.795158 1 cache.go:130] cache loaded from disk in 339.678568ms
I0930 07:03:55.795491 1 compaction.go:81] compaction worker started
I0930 07:03:55.795534 1 main.go:77] listening on 0.0.0.0:8080
I0930 07:03:56.796094 1 updater.go:53] worker iteration for 2tt6kt9l
I0930 07:04:05.795815 1 compaction.go:92] compaction iteration started
I0930 08:15:05.809959 1 compaction.go:155] compaction task 3c4b3c56d9bf3ed9c6fb8ca80b6e51d3 [1664511240,1664514840,1664518440,1664522040]:3600 -> 1664511240:14400 done in 12.387516ms
I0930 08:15:05.811276 1 compaction.go:144] deleting chunk after compaction: /data/cache/2tt6kt9l/2tt6kt9l-ad52fcad143b8b1451800115bbe853fe-1664511240-120-30.db
I0930 08:15:05.811322 1 compaction.go:144] deleting chunk after compaction: /data/cache/2tt6kt9l/2tt6kt9l-ad52fcad143b8b1451800115bbe853fe-1664514840-120-30.db
I0930 08:15:05.811344 1 compaction.go:144] deleting chunk after compaction: /data/cache/2tt6kt9l/2tt6kt9l-ad52fcad143b8b1451800115bbe853fe-1664518440-120-30.db
I0930 08:15:05.811370 1 compaction.go:144] deleting chunk after compaction: /data/cache/2tt6kt9l/2tt6kt9l-ad52fcad143b8b1451800115bbe853fe-1664522040-120-30.db
I0930 08:15:05.811410 1 compaction.go:155] compaction task ad52fcad143b8b1451800115bbe853fe [1664511240,1664514840,1664518440,1664522040]:3600 -> 1664511240:14400 done in 1.410773ms
I0930 08:15:15.795574 1 compaction.go:92] compaction iteration started
I0930 08:15:25.796272 1 compaction.go:92] compaction iteration started
I0930 08:15:26.200839 1 updater.go:53] worker iteration for 2tt6kt9l
- Can you show a screenshot of the settings page (
/p/2tt6kt9l/settings
)? - Execute the
kube_pod_info
query in your VictoriaMetrics and show the output.
so, I did found that, I was not scraping metrics of kube-state-metrics from where the coroot cluster was running, but with adding annotations I got the metrics
kube_pod_info{app_kubernetes_io_component="metrics", app_kubernetes_io_instance="kube-state-metrics", app_kubernetes_io_managed_by="Helm", app_kubernetes_io_name="kube-state-metrics", app_kubernetes_io_part_of="kube-state-metrics", app_kubernetes_io_version="2.6.0", container="kube-state-metrics", created_by_kind="DaemonSet", created_by_name="aws-node-termination-handler", datacenter="eks-12-ecc-xxxx-xxxxx", exported_namespace="aws-node-termination-handler", exported_node="ip-10-124-128-97.us-west-2.compute.internal", exported_pod="aws-node-termination-handler-4tzqm", helm_sh_chart="kube-state-metrics-4.20.1", host_ip="10.124.128.97", host_network="true", instance="10.8.30.247:8080", job="1", namespace="monitoring", node="ip-10-124-130-55.us-west-2.compute.internal", pod="kube-state-metrics-c6678766c-cbprt", pod_ip="10.124.128.97", pod_template_hash="c6678766c", priority_class="system-node-critical", uid="1385a8d7-9a21-4674-8dfa-b0cb50fe6b54"}
Coroot uses metrics gathered by kube-state-metrics to join containers into applications. So, this probably should fix the issue.
So, after adding the annotations and can see the metrics in VM.. still it complains about some pods missing and suddenly it detects them.. seems like it is loosing connections
W0930 18:55:57.155919 1 containers.go:65] unknown pod: logging/elasticsearch-es-client-1, seems like no kube-state-metrics installed
W0930 18:55:57.155951 1 containers.go:65] unknown pod: keda/keda-operator-675b587d7b-xcls7, seems like no kube-state-metrics installed
W0930 18:55:57.156000 1 containers.go:65] unknown pod: kube-system/cilium-7lfxm, seems like no kube-state-metrics installed
W0930 18:55:57.156039 1 containers.go:65] unknown pod: kube-system/cilium-ns58n, seems like no kube-state-metrics installed
W0930 18:55:57.156068 1 containers.go:65] unknown pod: logging/elasticsearch-es-warm-0, seems like no kube-state-metrics installed
W0930 18:55:57.156106 1 containers.go:65] unknown pod: kube-system/cilium-q5gqr, seems like no kube-state-metrics installed
W0930 18:55:57.156140 1 containers.go:65] unknown pod: kube-system/cilium-c6kzq, seems like no kube-state-metrics installed
W0930 18:55:57.156176 1 containers.go:65] unknown pod: kube-system/cilium-operator-69c65bf5c6-mrz6b, seems like no kube-state-metrics installed
W0930 18:55:57.156257 1 containers.go:65] unknown pod: elastic-operator/elastic-operator-1, seems like no kube-state-metrics installed
W0930 18:55:57.156292 1 containers.go:65] unknown pod: kube-system/cilium-c6kzq, seems like no kube-state-metrics installed
W0930 18:55:57.156336 1 containers.go:65] unknown pod: kube-system/cilium-c6kzq, seems like no kube-state-metrics installed
W0930 18:55:57.156374 1 containers.go:65] unknown pod: kube-system/kube-proxy-dxkzf, seems like no kube-state-metrics installed
W0930 18:55:57.156413 1 containers.go:65] unknown pod: kube-system/cilium-68x2b, seems like no kube-state-metrics installed
I0930 18:55:57.163314 1 constructor.go:64] got 18 nodes, 1656 services, 1450 applications
2022/09/30 18:55:57 http: panic serving 127.0.0.1:50250: runtime error: invalid memory address or nil pointer dereference
goroutine 5986 [running]:
net/http.(*conn).serve.func1()
/usr/local/go/src/net/http/server.go:1825 +0xbf
panic({0xac2f00, 0x1203280})
/usr/local/go/src/runtime/panic.go:844 +0x258
github.com/coroot/coroot/api/views/overview.Render(0xc00029cbd0)
/go/src/api/views/overview/overview.go:107 +0xb07
github.com/coroot/coroot/api/views.Overview(...)
/go/src/api/views/views.go:20
github.com/coroot/coroot/api.(*Api).Overview(0xa92fa0?, {0xc5ccf0, 0xc006c16380}, 0xc000241e00?)
/go/src/api/api.go:193 +0x91
net/http.HandlerFunc.ServeHTTP(0xc006c0d000?, {0xc5ccf0?, 0xc006c16380?}, 0x0?)
/usr/local/go/src/net/http/server.go:2084 +0x2f
github.com/gorilla/mux.(*Router).ServeHTTP(0xc0001c2240, {0xc5ccf0, 0xc006c16380}, 0xc006c0cc00)
/go/pkg/mod/github.com/gorilla/mux@v1.8.0/mux.go:210 +0x1cf
net/http.serverHandler.ServeHTTP({0xc000775860?}, {0xc5ccf0, 0xc006c16380}, 0xc006c0cc00)
/usr/local/go/src/net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc000fbe460, {0xc5d398, 0xc0003dcd80})
/usr/local/go/src/net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:3071 +0x4db
I0930 18:55:57.325649 1 compaction.go:92] compaction iteration started
I0930 18:55:58.325732 1 updater.go:53] worker iteration for 2tt6kt9l
in drop down I can see applications
but after selecting nothing is there
also UI is flaky.. the applications keep on changing
@Schaudhari7565, apologies for the delayed response. We have fixed the panic
. Please update Coroot.
I did update to latest 0.5.0 and still got panic
is this due to large number of applications? wanted to know if we can restrict the applications or filter based on some labels .. like datacenter=eks16
to avoid reading all the metrics at same time
I1011 17:28:51.890568 1 constructor.go:68] got 46 nodes, 1557 services, 1484 applications
2022/10/11 17:28:52 http: panic serving 127.0.0.1:56478: runtime error: invalid memory address or nil pointer dereference
goroutine 20983 [running]:
net/http.(*conn).serve.func1()
/usr/local/go/src/net/http/server.go:1825 +0xbf
panic({0xaf80e0, 0x1260280})
/usr/local/go/src/runtime/panic.go:844 +0x258
github.com/coroot/coroot/auditor.(*appAuditor).cpu(0xc01207ebb8)
/go/src/auditor/cpu.go:39 +0x4a2
github.com/coroot/coroot/auditor.Audit(0xc079e1e000)
/go/src/auditor/auditor.go:26 +0x10a
github.com/coroot/coroot/api/views/overview.Render(0xc079e1e000)
/go/src/api/views/overview/overview.go:40 +0x9d
github.com/coroot/coroot/api/views.Overview(...)
/go/src/api/views/views.go:20
github.com/coroot/coroot/api.(*Api).Overview(0xc079e00120?, {0xc9f470, 0xc079e0c000}, 0xc0002c3680?)
/go/src/api/api.go:194 +0x91
net/http.HandlerFunc.ServeHTTP(0xc079e1a000?, {0xc9f470?, 0xc079e0c000?}, 0xc0c526a9c0?)
/usr/local/go/src/net/http/server.go:2084 +0x2f
github.com/gorilla/mux.(*Router).ServeHTTP(0xc000242000, {0xc9f470, 0xc079e0c000}, 0xc06c530000)
/go/pkg/mod/github.com/gorilla/mux@v1.8.0/mux.go:210 +0x1cf
net/http.serverHandler.ServeHTTP({0xc06c512ea0?}, {0xc9f470, 0xc079e0c000}, 0xc06c530000)
/usr/local/go/src/net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc06c528000, {0xc9fb18, 0xc00013d9b0})
/usr/local/go/src/net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:3071 +0x4db
I1011 17:28:55.345169 1 compaction.go:92] compaction iteration started
I1011 17:29:05.344864 1 compaction.go:92] compaction iteration started
I1011 17:29:15.345428 1 compaction.go:92] compaction iteration started
I1011 17:29:25.345135 1 compaction.go:92] compaction iteration started
^C
It is a new bug. We will fix it soon. Meanwhile, please install version 0.4.1
Scaled down to 0.4.1, will monitor the logs
@Schaudhari7565, we've fixed the panic
bug. Please upgrade Coroot to version >=0.5.1