Gradual Increase in Memory Consumption

Question

Gradual Increase in Memory Consumption

J0ram opened this issue 4 months ago · 8 comments

J0ram commented 4 months ago

Hi,

Since 1.4.8 we have observed our shell-operator pods slowly consuming memory over time:

I made a local branch with pprof installed and it appears to be logrus that is not releasing its memory:

Environment:

1.48 - 1.4.11 (we've tested each version on release)
Kubernetes version: AKS 1.29.4
Installation type Helm

Worth noting that 1.4.7 behaves as expected on the same cluster.

Anything else we should know?:
I find it odd that nobody else is reporting this issue - I can only assume it's some oddity in our environment but I'm pretty much out of ideas.

From what I can see the version of the logrus package hasn't changed between versions of this application (particularly 1.47 - 1.48). If you have any ideas of how we could debug further that would be appreciated.

I've attached the heap dump if that's of any help

Thanks

heap.zip

Answer 1 · 2024-09-27T08:16:18.000Z

Hit by this issue. Tryed to set GOMEMLIMIT with no luck (then checked Go version = 1.19 which does not support a soft memory limit).

Shell Operator: 1.4.12
K8s: 1.30.3
Linux Kernel: 6.6.52 with THP enabled in madvise mode (it is relevant for Go > 1.20 I think)

Reproducer project: https://github.com/cit-consulting/hetzner-failoverip-controller

Answer 2 · 2024-10-02T09:12:13.000Z

Also hitting this.

Shell Operator: 1.4.10
K8s: 1.29.8

Answer 3 · 2024-10-18T12:52:42.000Z

Same here with multiple operators running on different clusters using 1.4.10. Pod crashes and restarts when it hits memory limit.

Answer 4 · 2024-10-20T13:42:05.000Z

Checked 1.4.14 - classic memory leak:

Because of Go 1.22 and GOLIMIT, the operator uses a lot of CPU on GC before being killed by Kubelet.

Answer 5 · 2024-10-23T08:04:38.000Z

Hello. Thank you for the report. We also met the logrus leak a few time ago. we're currently working on changing the logger.

Answer 6 · 2024-10-23T12:06:59.000Z

We have a quick fix in v1.4.15. Could you try it, please.

Answer 7 · 2024-10-23T16:04:06.000Z

We have a quick fix in v1.4.15. Could you try it, please.

The memory profile looks better but I hit by log duplication #675

Keeps monitoring.

Answer 8 · 2024-10-24T08:14:18.000Z

We've been running 1.4.15 overnight - memory usage is completely flat :)

Thanks all