robusta-dev/robusta

OOMKilled on Teams does not attaching the correct log for the container instead of it the logs attached is for the first container

antikilahdjs opened this issue · 4 comments

Describe the bug

Hello, I have many containers inside of pod then when the OOM occurs and sent it to Teams, the log attached is not for the container with the OOMkill instead of it the logs is for the first container. As example, I have the pod with these names below:

1 - pod-test-logs-oomkill
- container-test-01
- containerd-massive-text-01
- containerd-massive-text-02

If the OOMkill occurs in container containerd-massive-text-02 the logs attached is for the container-test-01 because the container is the first in the list.

The image below will demonstrate it

image

The container with the oomkilled was recorder-1026 but the logs attached was for the container rtsp-proxy

image

To Reproduce
Steps to reproduce the behavior:
1 - Install using the official helm charts
2 - Create a pod with 2 or more containers
3 - I have a Prometheus configured with Alertmanager
4 - Configure the SINK to use Teams
5 - My trigger inside of the helm is:

- triggers:
  - on_pod_oom_killed:
      rate_limit: 3600
  actions:
  - pod_oom_killer_enricher: {}
  - logs_enricher: {}
  - pod_node_graph_enricher:
      resource_type: Memory
      display_limits: true
  - oomkilled_container_graph_enricher:
      resource_type: Memory
      display_limits: true
  stop: true

Expected behavior

The logs attached needs to be for the container with the OOMkilled and not for the top 1 container in the list

Screenshots
It was added above

Desktop (please complete the following information):

  • OS: RedHat 8.5 and Ubunut 20.04LTS
  • Browser: Chrome
  • Version: 119

Additional context
Add any other context about the problem here.

Hi 👋, thanks for opening an issue! Please note, it may take some time for us to respond, but we'll get back to you as soon as we can!

  • 💬 Slack Community: Join Robusta team and other contributors on Slack here.
  • 📖 Docs: Find our documentation here.
  • 🎥 YouTube Channel: Watch our videos here.

@antikilahdjs thanks for reporting.

Some thoughts on a fix: We should probably fetch the logs inside pod_oom_killer_enricher as there we have context on which container was OOMKilled. Architecturally, logs_enricher just doesn't have that information.

@antikilahdjs would you be interested in contributing a fix for this? Also very curious where those screenshots are from. Looks like an interesting frontend for Kubernetes.

Hello @aantn, thanks for your comments and I really sorry but I am not a developer and I cannot help with the code base to help on it. There is another way to fix it or really need a re-factore code?

So if you need to know about my frontend you can reach out here https://github.com/kubesphere/kubesphere

The fix is included in 0.10.26 which was released today