runtime/metrics: add /gc/heap/live:bytes
felixge opened this issue ยท 22 comments
There appears to be no way to monitor what is called the Live Heap
in the GC Guide.
The closest runtime/metric seems to be /memory/classes/heap/objects:bytes
which is Live Heap
+ New Heap
(i.e. includes potentially dead but unswept allocations).
Monitoring Live Heap
seems to be useful when running with GOGC=off
and a GOMEMLIMIT
to understand if the live heap of a program is approaching the GOMEMLIMIT
which could lead to high CPU usage or OOM kills.
Therefor I'm proposing to add /memory/classes/heap/live_objects:bytes
and /gc/heap/live_objects:objects
to the runtime/metrics package. They should return similar values as what is currently returned by the InUse*
values of the runtime.MemProfile()
profile. It's probably worth to discuss whether the term InUse
is preferable over Live
.
I created a very hacky prototype of this here, but there is probably a much better way to do this.
cc @mknyszek
A similar proposal was rejected previously (#51966). Part of the problem was inclusion in MemStats
, but runtime/metrics
also existed and I made some other arguments against it. Mainly, I think relying on it solely as an indicator can be somewhat fragile if you put it in context of the Go 1.18 GOGC changes. I wasn't ever fundamentally opposed to this, but at the time for the use-case presented, it just didn't seem worth it. I also wanted to think more carefully about what implications it could have (and why we didn't do it already).
However, I do think your argument that it's useful for identifying issues with a memory limit is "new information" (now that the memory limit has landed) as far as the proposal processes goes. I think that might be enough to push this over the edge into more serious consideration for me personally.
I think to start with I'd be inclined at this point to call this /gc/heap/live:bytes
. This way it lives next to /gc/heap/goal:bytes
. /memory/classes
to me is more about a breakdown of memory at a higher level and all the metrics inside there are supposed to be orthogonal. We'd have to break up some of the other metrics and I don't think it's worth it for this.
If we do this, we should probably also export scannable stack bytes and scannable globals bytes, at least as far as they go into the GOGC calculation. We should also really export GOGC and the memory limit as well, since they can change during application execution.
I don't really see enough utility from exporting the count of live objects though. I think a memory profile is better for that sort of thing anyway. (Also, we don't currently track that (mod memory profiling) and it's more work.)
I created a very hacky prototype of this here, but there is probably a much better way to do this.
Yeah, there's a simpler way for bytes. :) You want gcController.heapMarked
. You can just read that directly in the metrics callback without any synchronization since it only gets updated during a STW.
Sounds great. I'll update my patch based on your suggestions.
I also wanted to think more carefully about what implications it could have (and why we didn't do it already).
Yeah, that makes sense. Seems like this is missing the code freeze anyway.
I updated the CL to use gcController.heapMarked
, but found one odd thing.
If we do this, we should probably also export scannable stack bytes and scannable globals bytes, at least as far as they go into the GOGC calculation. We should also really export GOGC and the memory limit as well, since they can change during application execution.
I can prepare CLs for those as well. Should I submit them separately?
I updated the CL to use
gcController.heapMarked
, but found one odd thing.
It's an accounting artifact currently required for good performance. It would be nice to fix, but I'm not yet sure how without either (1) impacting the allocation fast path or (2) slowing down metrics.Read
to ReadMemStats
levels (this path would avoid the global latency impact that ReadMemStats
has because it's not necessary to actually, fully stop the world, but the worst-case latency of metrics.Read
becomes effectively the same as stopping the world).
For users that really need this consistency (like the testing
package), ReadMemStats
provides it, so there is a workaround. Most of the time though, the skew doesn't make a very meaningful difference, and the memory stats (in /memory/classes
) are all at least guaranteed to be consistent with each other. It might be worth mentioning in the docs that this stat specifically isn't necessarily consistent with the /memory/classes
stats. We can lift this restriction in the future if we can figure out a way to flush cached efficiently (I'll keep thinking about it).
If we do this, we should probably also export scannable stack bytes and scannable globals bytes, at least as far as they go into the GOGC calculation. We should also really export GOGC and the memory limit as well, since they can change during application execution.
I can prepare CLs for those as well. Should I submit them separately?
Either or. Your current CL is not very big.
This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
โ rsc for the proposal review group
Based on the discussion above, this proposal seems like a likely accept.
โ rsc for the proposal review group
No change in consensus, so accepted. ๐
This issue now tracks the work of implementing the proposal.
โ rsc for the proposal review group
@felixge were you planning to implement this?
Yes, I'll try to submit a good patch for /gc/heap/live:byte
๐.
If we do this, we should probably also export scannable stack bytes and scannable globals bytes, at least as far as they go into the GOGC calculation. We should also really export GOGC and the memory limit as well, since they can change during application execution.
If time allows I could also try to look into these, but I'm not sure if they'd be covered by the accepted proposal or would need separate proposals?
That's a good question. I personally feel we should just include those in the proposal.
Change https://go.dev/cl/497315 mentions this issue: runtime/metrics: add /gc/heap/live:bytes
Change https://go.dev/cl/497317 mentions this issue: runtime/metrics: add /gc/heap/goal:percent
Change https://go.dev/cl/497316 mentions this issue: runtime/metrics: add /gc/heap/limit:bytes
Change https://go.dev/cl/497320 mentions this issue: runtime/metrics: add /gc/global/scannable:bytes
Change https://go.dev/cl/497319 mentions this issue: runtime/metrics: add /gc/stack/scannable:bytes
Change https://go.dev/cl/497575 mentions this issue: runtime/metrics: add /gc/scan/heap:bytes
Change https://go.dev/cl/497576 mentions this issue: runtime/metrics: add /gc/scan/total:bytes
All of the metrics discussed in this proposal have landed, see patches above. Closing this ๐ฅณ
Thanks @mknyszek et al. for reviews and discussion ๐
Change https://go.dev/cl/497880 mentions this issue: runtime: ensure consistency of /gc/scan/*
Change https://go.dev/cl/498075 mentions this issue: doc: add release note for runtime/metrics additions