grafana/beyla

Beyla traces not showing up

revathyrams opened this issue · 15 comments

Hey! I'm running Beyla on a local kubernetes (colima) cluster on an M2 mac.

I'm following the example here with just a few tweaks to the beyla-config to enable debug logs and discovering k8s namespaces,

apiVersion: v1
kind: ConfigMap
metadata:
  namespace: beyla
  name: beyla-config
data:
  beyla-config.yml: |
    # this is required to enable kubernetes discovery and metadata
    attributes:
      kubernetes:
        enable: true
    # this will provide automatic routes report while minimizing cardinality
    print_traces: true
    log_level: DEBUG
    routes:
      unmatched: heuristic
    discovery:
      services:
        - k8s_namespace: beyla
        - k8s_namespace: default
        - k8s_namespace: kube-system

From the logs I think beyla is able to watch other pods in different namespaces.

level=DEBUG msg="inserting pod" component=kube.Metadata informer=Pod name=beyla-lzgxh namespace=beyla uid=a4df9638-84ce-4b32-a3d8-b6ad2148f375 owner=k8s.daemonset.name:beyla node=colima startTime="2024-07-26 00:05:29 +0000 UTC" containerIDs=[5c9dbe4c8a9efd132c26dfee176765a6df8bfb6bf8fc4a258a49ebcbe55aeb17]

But I do not see any traces in Grafana Cloud. How should I debug further?
I do see these logs about traces

time=2024-07-26T00:19:52.441Z level=DEBUG msg="new process" component=discover.WatcherKubeEnricher pid=1244
time=2024-07-26T00:19:52.441Z level=DEBUG msg="Found namespace" component=httpfltr.Tracer nsPid=pid:[4026531836]
time=2024-07-26T00:19:52.441Z level=DEBUG msg="new process" component=discover.WatcherKubeEnricher pid=2968
time=2024-07-26T00:19:52.441Z level=DEBUG msg="Found namespace" component=httpfltr.Tracer nsPid=pid:[4026531836]
time=2024-07-26T00:19:52.441Z level=DEBUG msg="can't get container info for PID" component=discover.WatcherKubeEnricher pid=2968 error="/proc/2968/cgroup: couldn't find any docker entry for process with PID 2968"

Do these logs mean beyla is able to receive the traces but unable to send it to Grafana cloud?

It's important to see if there are any errors/warnings reported in Beyla logs, they might be hidden between the debug messages. Not seeing data in Grafana cloud typically means it's unable to send them, usually related to wrong credentials or OTLP endpoint.

I couldn't find any error/warning logs. Just a debug level log of

level=DEBUG msg="can't get container info for PID" component=discover.watcherKubeEnricher pid=141004 error="/proc/141004/cgroup: couldn't find any docker entry for process with PID 141004"

I double checked the credentials and OTLP endpoint too, and they look right.

Do you have any other pointers to debug this? Are there are any specific logs to expect when Beyla creates a trace or sends it to the OTLP endpoint?

I don't see other logs besides these

time=2024-07-26T18:06:24.808Z level=DEBUG msg="process stopped" component=discover.watcherKubeEnricher pid=134837
time=2024-07-26T18:06:24.808Z level=DEBUG msg="filtering processes" component=discover.CriteriaMatcher len=1
time=2024-07-26T18:06:24.808Z level=DEBUG msg="deleted untracked process. Ignoring" component=discover.CriteriaMatcher pid=134837
time=2024-07-26T18:06:24.808Z level=DEBUG msg="processes matching selection criteria" component=discover.CriteriaMatcher len=0

Thanks for the info. Let's then see first if Beyla is capturing anything. Can you please set BEYLA_PRINT_TRACES=1 ? This will show on the standard output which requests is Beyla capturing. If we see output there, then it's likely something with the connection to cloud, otherwise it's something to do with the configuration of Beyla.

Attaching a full log, doesn't have to be with BEYLA_LOG_LEVEL=debug, will help.

I already have print_traces: true in my beyla config.

Attaching the complete logs here, in debug level:
logs.txt

It seems to be a permission thing, we don't find any services we can instrument. Can you please share with us the full Daemonset yaml file (clearing any secrets)?

Sure, here's the full beyla-config file

apiVersion: v1
kind: ConfigMap
metadata:
  namespace: beyla
  name: beyla-config
data:
  beyla-config.yml: |
    attributes:
      kubernetes:
        enable: true
    print_traces: true
    log_level: INFO
    routes:
      unmatched: heuristic
    discovery:
      services:
        - k8s_namespace: beyla
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  namespace: beyla
  name: beyla
spec:
  selector:
    matchLabels:
      instrumentation: beyla
  template:
    metadata:
      labels:
        instrumentation: beyla
    spec:
      serviceAccountName: beyla
      hostPID: true # mandatory!
      containers:
        - name: beyla
          image: grafana/beyla:latest
          imagePullPolicy: IfNotPresent
          securityContext:
            privileged: true # mandatory!
            readOnlyRootFilesystem: true
          volumeMounts:
            - mountPath: /config
              name: beyla-config
            - mountPath: /var/run/beyla
              name: var-run-beyla
          env:
            - name: BEYLA_CONFIG_PATH
              value: "/config/beyla-config.yml"
            - name: OTEL_EXPORTER_OTLP_ENDPOINT
              value: "redacted"
            - name: OTEL_EXPORTER_OTLP_HEADERS
              value: "Authorization=Basic redacted"
      volumes:
        - name: beyla-config
          configMap:
            name: beyla-config
        - name: var-run-beyla
          emptyDir: {}

And this is the serviceaccount

apiVersion: v1
kind: ServiceAccount
metadata:
  namespace: beyla
  name: beyla
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: beyla
rules:
  - apiGroups: ["apps"]
    resources: ["replicasets"]
    verbs: ["list", "watch"]
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: beyla
subjects:
  - kind: ServiceAccount
    name: beyla
    namespace: beyla
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: beyla

I think I might have to give additional cluster role permissions for services and deployments

- apiGroups: [""]
  resources: ["services"]
  verbs: ["list"]
- apiGroups: ["*"]
  resources: ["deployments"]
  verbs: ["get", "list", "watch"]

Still don't see any traces in the logs, any permissions besides what's mentioned here needs to be set?

not sure if this is relevant but i noticed this during an upgrade but no mention of it

print_traces: true - removed
trace_printer: text - new

https://github.com/grafana/beyla/pull/1035/files

but the corresponding docs / notes havent been updated

@tbgbeansbot I don't believe the print_traces changes are related - as a matter of fact, print_traces should still work (if it's set, it will have the same effect as trace_printer=text, we just deprecated it and you should now get a warning when using it).

By the way, we've updated these docs - if I missed anything on that front, could you point me to it so that I can update it?

Some points:

  • I recently had problems with freshly-created Colima instances. It seems something has changed in the way Colima (or its Lima backend) create the VMs. Could you give a try with Rancher Desktop + Kind? I also verified that Docker Desktop currently works well
  • Can you share the debug logs? also the deployment files for you applications would help.

We had a meeting and managed to get Beyla working on your environment, so closing the task now. @revathyrams feel free to reopen it if you think that this task must not be closed.

thanks @mariomac! We tried yesterday in GKE and that works perfectly well!

But in my local (Colima on mac) the error is

level=ERROR msg="Unable to load eBPF watcher for process events" component=discover.ProcessWatcher interval=5s error="instrumenting function \"sys_bind\": setting kprobe: creating perf_kprobe PMU (arch-specific fallback for \"sys_bind\"): token sys_bind: opening perf event: permission denied"

I'm not focussed on setting it up on my local now as it works fine in Google Cloud, so let the issue be closed 👍

Hi @revathyrams that looks like your local Linux requires the SYS_ADMIN capability. Under certain Linux kernel configurations, you would need it anyway.

However, we recently realized that latest versions of Colima don't properly work well with the pattern we use to run unprivileged containers (a pod mounting the BPF filesystem and another using it).

I verified that Rancher Desktop and Docker Desktop would work. You can follow the instructions here: https://github.com/mariomac/local-beyla-demo/