lastquestion/explain-pause-mode

Provide memory profiling

yantar92 opened this issue · 6 comments

Some hangs in Emacs can be caused by garbage collections. explain-pause-profiles is not exatly useful in such cases:

Slow profile report
  Time: Wed May 27 03:37:35 2020
  Command: gcmh-idle-garbage-collect, idle-timer
  Duration: 1231 ms

[ View profile ]

Slow profile report
  Time: Wed May 27 03:19:07 2020
  Command: gcmh-idle-garbage-collect, idle-timer
  Duration: 4305 ms

[ View profile ]

- ...                                                             408 100%
   Automatic GC                                                   408 100%

It would be helpful if memory profiling was also done (with configurable threshold). similar to time profiling.

Well at least it told you what actually called the GC ;) I didn't know about https://gitlab.com/koral/gcmh, I get the idea of it now... however, the GC isn't interruptible which means that once you hit idle-timer, you can't actually type while it does a GC. Which you discovered. I don't think it's very unconventional, it's faking what most modern GC's try to do :-)

I guess an good fundamental root work in emacs would be non-blocking GC, right?

Anyway, back to your feature request. Yes, it makes sense. I'm wary of the cost of also doing memory profiling automatically, and I'd need to understand how the profiler actually does it, and what kind of report it generates. But in general, obviously this should be a configurable option.

Takeaway:
✅ yes it should be taking memory profiles if configured at some threshold. This means sometimes it'll take both memory and CPU profiles.
✅ this should mean that the UI should expose both CPU and memory profile actions
❓ I don't know yet what that default threshold ought to be

Also, I've been thinking about dealing with gc in a specific way, basically expanding explain-pause to start being cleverer. There is a gc-hook, post-gc-hook, that runs after gc is done. I'll write a issue for ideas around that.

Well at least it told you what actually called the GC ;)

That is not surprising indeed. I knew that GC it taking too much time, which is the reason I am using gcmh. If I don't use it, garbage collections just slows down Emacs proportionally, which is much more unpleasant. However, it is hard to know what actually causes the frequent garbage collection, because the effect of excessive memory usage is postponed until the next GC.

I don't know yet what that default threshold ought to be

It should probably be a fraction of gc-cons-threshold. Some reasonable estimation might be explain-pause-blocking-too-long-ms/average time needed to run garbage-collect*gc-cons-threshold.

Hmm, so upon reading your reply, and thinking more, I just want to check that our understanding is the same. Actually, you probably meant this and I misunderstood you until I read your reply. LOL.

But I think I got it now.

So what we want to do is actually measure memory usage of actual commands, e.g. what runs when you are using emacs, and we can do that via reading the GC variables.

We can ask emacs for any of the values in https://www.gnu.org/software/emacs/manual/html_node/elisp/Garbage-Collection.html, e.g. for example we can see how many objects were allocated, whether a GC occurred during the command, what the memory-limit now is, etc. and subtract from "before" the command. This is identical to the bookkeeping that is happening for execution time. Then, in a similar way to execution time, if commands consistently use up lots of memory, then we can run the memory profiler on the next invocation of that command.

So it's not actually important that the GC happens, or that we want to measure the memory profile over GC - in fact I don't even know if the memory profiler built into profiler.el measures what the GC does... but anyway, that doesn't matter. Because measuring the GC is not helpful to the user, it just tells you what you already know, like you said: Something allocated lots of memory but I don't know what.

Actually measuring memory usage over commands would be much more helpful.

There are some things to consider which are similar for measuring execution time, like that processes and timers can execute during commands, and so we need to subtract out their allocations correctly. Unfortunately we don't get super granular info in elisp land for GC data, but I think memory-use-counts should work.

Further thought: this should be a new screen in explain-mode-top, similar to the CPU times that it now generates. Or maybe just a new row? I'm glad I implemented my own table now...

But I think I got it now.

Yep. Your understanding is correct. Sorry for being not clear in my first message.

Further thought: this should be a new screen in explain-mode-top, similar to the CPU times that it now generates.

This sounds reasonable.

This is a bigger feature, but really makes sense and also mirrors the CPU side of things. I will work on it, after fixing bugs and finishing up explain-pause-top & cleaning up alerting / logging logic. I'll move this into the board when I get ready to start working on it, but I definitely think the feature is good and it will happen!