heap profile rework

Question

heap profile rework

Closed this issue 6 years ago · 6 comments

e8a1450 was not a good change.
the virtual memory space is not a good representation of - and often much larger than - actual memory usage. (e.g. RSS)

I think 2 modes are useful for memory profiles:

looking at HeapAlloc. because it represents heap allocations specifically, which is what a heap profile helps with, but RSS usage of a process can be much more than just heapalloc.
rss usage: as this is an accurate representation of how much RAM we use, and this is basically what determines whether a process will be killed or not. I'm not taking into account swap space here though. see https://unix.stackexchange.com/questions/153585/how-does-the-oom-killer-decide-which-process-to-kill-first this seems to be true for standard processes on a host, as well as kubernetes which is docker which uses cgroup limits.

Answer 1 · 2018-09-14T11:07:19.000Z

question is, how to implement. some ideas:

only one of them. which one?
a switch so you can choose between them
both, as separate triggers. so you pick the one you want
both, as same trigger, where you configure the threshold for both. (and maybe use -1 to disable)

I think 4 sounds best.
(note rss requires reading from /proc)

Answer 2 · 2018-09-14T11:07:49.000Z

cc @tomwilkie @gouthamve

Answer 3 · 2018-09-17T10:58:35.000Z

Hmm, I'm going to lean towards looking at just HeapAlloc as reading /proc is sometimes protected and needs to be evaluated carefully. Further, we are looking to trigger thing for ballpark numbers rather than exact ones, and Heap is a good enough estimation for most use-cases, I guess.

Answer 4 · 2018-09-17T11:01:16.000Z

reading /proc is sometimes protected

are you talking about insufficient read permissions, or /proc sometimes not being available in container environments? (i suppose both could happen)

needs to be evaluated carefully

what do you mean? the numbers are pretty well defined

Answer 5 · 2018-09-17T11:06:17.000Z

needs to be evaluated carefully

I meant including it :) Having said that prometheus does it by reading /proc too:
https://github.com/prometheus/procfs/blob/HEAD/proc_stat.go#L171-L174
It exposes this metric using that: process_resident_memory_bytes

Given this metric is included by default in the prometheus-go-client, I guess my resistance to use it here is now very low :)

Answer 6 · 2018-12-19T12:39:01.000Z

fixed by #3