flant/shell-operator

Gradual Increase in Memory Consumption

J0ram opened this issue · 8 comments

Hi,

Since 1.4.8 we have observed our shell-operator pods slowly consuming memory over time:
image

I made a local branch with pprof installed and it appears to be logrus that is not releasing its memory:
image

Environment:

  • 1.48 - 1.4.11 (we've tested each version on release)
  • Kubernetes version: AKS 1.29.4
  • Installation type Helm

Worth noting that 1.4.7 behaves as expected on the same cluster.

Anything else we should know?:
I find it odd that nobody else is reporting this issue - I can only assume it's some oddity in our environment but I'm pretty much out of ideas.

From what I can see the version of the logrus package hasn't changed between versions of this application (particularly 1.47 - 1.48). If you have any ideas of how we could debug further that would be appreciated.

I've attached the heap dump if that's of any help

Thanks

heap.zip

Hit by this issue. Tryed to set GOMEMLIMIT with no luck (then checked Go version = 1.19 which does not support a soft memory limit).

Shell Operator: 1.4.12
K8s: 1.30.3
Linux Kernel: 6.6.52 with THP enabled in madvise mode (it is relevant for Go > 1.20 I think)

Reproducer project: https://github.com/cit-consulting/hetzner-failoverip-controller

Also hitting this.

Shell Operator: 1.4.10
K8s: 1.29.8

Same here with multiple operators running on different clusters using 1.4.10. Pod crashes and restarts when it hits memory limit.
Screenshot 2024-10-18 at 14 51 25

Checked 1.4.14 - classic memory leak:

Снимок экрана 2024-10-20 в 16 38 52

Because of Go 1.22 and GOLIMIT, the operator uses a lot of CPU on GC before being killed by Kubelet.

Hello. Thank you for the report. We also met the logrus leak a few time ago. we're currently working on changing the logger.

We have a quick fix in v1.4.15. Could you try it, please.

We have a quick fix in v1.4.15. Could you try it, please.

The memory profile looks better but I hit by log duplication #675

Keeps monitoring.

We've been running 1.4.15 overnight - memory usage is completely flat :)

Thanks all