sublimehq/sublime_text

High memory in plugin_host-3.8.exe causes periodic front-end hangs on build 4180

Opened this issue · 11 comments

Description of the bug

Since switching to v4180 I periodically get hangs (all Sublime windows completely unresponsive) of around 30s when editing code. Rolling back to build 4169 makes the problem go away.

Unfortunately, that's not a lot to go on, however this is a pretty major issue so I'm happy to provide any information or do any diagnosis requested the next time the hang happens. If other people experiencing this can chime in to help narrow down what may be the cause, that'd be helpful as well, I'd suspect!

The potentially relevant details as I've encountered this:

  • Happens only on build 4180, not the older 4169
  • Happens a few times a day, over the course of hours of coding in .ts and .js files (possibly only .ts files, I'll keep an eye on this going forward, but I'm in .ts files about 95% of the time so it's hard to say for sure)
  • Hangs feel like a minute (so they're probably about 30s :)
  • Always happen while I'm in the middle of writing a line of code, so my suspicion is it's related to code completion or similar
  • I have the following potentially relevant packages installed: LSP, LSP-typescript, SublimeLinter, SublimeLinter-eslint
    • Unfortunately these are too tied to my productivity to be able to run without them for a day, but if there's any profiling or developer tools that can identify if it's related to one of them, I'd be happy to help narrow things down
    • Note that even if it does only happen when one of these packages are installed, it still seems like a major bug in v4180 as everything's been working flawlessly on v4169

Steps to reproduce

Code all day.

Expected behavior

Smooth sailing.

Actual behavior

Editor hangs.

Sublime Text build number

4180

Operating system & version

Windows 11 23H2

(Linux) Desktop environment and/or window manager

No response

Additional information

No response

OpenGL context information

No response

Next time it happens please run Plugin Development: Profile Events from the command palette.

After re-updating and using v4180 it took a day or two before the stalls got noticeable. I was watching closely when a stall happened today. I timed the stall at around 10 seconds. Process Explorer showed plugin_host-3.8.exe using all the CPU for the duration (slightly unusual compared to the usual where it's a node.exe sub-process of plugin_host that's consuming the CPU).

Plugin Development: Profile Events shows nothing of note - I had done one in the hour or two before the stall happened, and then immediately after the stall happened, and nothing shows more than a 500ms delta from the previous report.

 This list shows how much time each plugin has taken to respond to each event:
 
 on_activated:
-    Default.history_list: 0.545s total, mean: 0.000s, max: 0.003s
+    Default.history_list: 0.558s total, mean: 0.000s, max: 0.004s
-    EditorConfig.EditorConfig: 0.342s total, mean: 0.000s, max: 0.024s
+    EditorConfig.EditorConfig: 0.351s total, mean: 0.000s, max: 0.024s
-    SublimeLinter.active_linters_view: 0.053s total, mean: 0.000s, max: 0.008s
+    SublimeLinter.active_linters_view: 0.055s total, mean: 0.000s, max: 0.008s
-    SublimeLinter.busy_indicator_view: 0.225s total, mean: 0.000s, max: 0.004s
+    SublimeLinter.busy_indicator_view: 0.228s total, mean: 0.000s, max: 0.004s
 
 on_close:
     Default.settings: 0.000s total, mean: 0.000s, max: 0.000s
-    LSP.plugin.documents: 0.008s total, mean: 0.000s, max: 0.001s
+    LSP.plugin.documents: 0.009s total, mean: 0.000s, max: 0.001s
     SublimeLinter.active_linters_view: 0.002s total, mean: 0.000s, max: 0.001s
     SublimeLinter.highlight_view: 0.005s total, mean: 0.000s, max: 0.001s
-    SublimeLinter.sublime_linter: 0.350s total, mean: 0.000s, max: 0.007s
+    SublimeLinter.sublime_linter: 0.359s total, mean: 0.000s, max: 0.007s
 
 on_hover:
-    Default.symbol: 0.011s total, mean: 0.000s, max: 0.001s
+    Default.symbol: 0.013s total, mean: 0.000s, max: 0.001s
-    LSP.plugin.documents: 0.077s total, mean: 0.000s, max: 0.008s
+    LSP.plugin.documents: 0.079s total, mean: 0.000s, max: 0.008s
     SublimeLinter.highlight_view: 0.013s total, mean: 0.000s, max: 0.001s
 
 on_load:
-    Default.exec: 0.253s total, mean: 0.000s, max: 0.017s
+    Default.exec: 0.259s total, mean: 0.000s, max: 0.017s
-    EditorConfig.EditorConfig: 0.555s total, mean: 0.001s, max: 0.004s
+    EditorConfig.EditorConfig: 0.564s total, mean: 0.001s, max: 0.004s
     LSP.boot: 0.007s total, mean: 0.000s, max: 0.001s
 
 on_modified:
-    Default.history_list: 1.746s total, mean: 0.000s, max: 0.056s
+    Default.history_list: 1.817s total, mean: 0.000s, max: 0.056s
     Default.settings: 0.000s total
-    SublimeClipboardHistory.jimblys_clipboard_history: 0.091s total, mean: 0.000s, max: 0.002s
+    SublimeClipboardHistory.jimblys_clipboard_history: 0.097s total, mean: 0.000s, max: 0.002s
-    SublimeLinter.highlight_view: 6.254s total, mean: 0.000s, max: 0.491s
+    SublimeLinter.highlight_view: 6.449s total, mean: 0.000s, max: 0.491s
 
 on_post_save:
-    EditorConfig.EditorConfig: 0.057s total, mean: 0.000s, max: 0.010s
+    EditorConfig.EditorConfig: 0.058s total, mean: 0.000s, max: 0.010s
-    Pretty JSON.PrettyJsonListeners: 0.009s total, mean: 0.000s, max: 0.001s
+    Pretty JSON.PrettyJsonListeners: 0.010s total, mean: 0.000s, max: 0.001s
 
 on_post_text_command:
-    Default.history_list: 0.766s total, mean: 0.000s, max: 0.008s
+    Default.history_list: 0.814s total, mean: 0.000s, max: 0.008s
-    Default.paste_from_history: 0.385s total, mean: 0.000s, max: 0.006s
+    Default.paste_from_history: 0.404s total, mean: 0.000s, max: 0.006s
-    LSP.plugin.documents: 0.406s total, mean: 0.000s, max: 0.002s
+    LSP.plugin.documents: 0.433s total, mean: 0.000s, max: 0.002s
 
 on_post_window_command:
-    LSP.boot: 0.100s total, mean: 0.000s, max: 0.002s
+    LSP.boot: 0.103s total, mean: 0.000s, max: 0.002s
-    SublimeLinter.panel_view: 0.063s total, mean: 0.000s, max: 0.002s
+    SublimeLinter.panel_view: 0.064s total, mean: 0.000s, max: 0.002s
 
 on_pre_close:
-    Default.history_list: 0.119s total, mean: 0.000s, max: 0.003s
+    Default.history_list: 0.122s total, mean: 0.000s, max: 0.003s
     Default.settings: 0.003s total, mean: 0.001s, max: 0.002s
-    LSP.boot: 0.031s total, mean: 0.000s, max: 0.003s
+    LSP.boot: 0.044s total, mean: 0.000s, max: 0.010s
     SublimeLinter.highlight_view: 0.006s total, mean: 0.000s, max: 0.001s
-    SublimeLinter.panel_view: 0.017s total, mean: 0.000s, max: 0.002s
+    SublimeLinter.panel_view: 0.018s total, mean: 0.000s, max: 0.002s
 
 on_pre_close_window:
     LSP.boot: 0.001s total, mean: 0.000s, max: 0.001s
 
 on_pre_move:
     LSP.boot: 0.000s total
 
 on_pre_save:
-    EditorConfig.EditorConfig: 0.234s total, mean: 0.001s, max: 0.003s
+    EditorConfig.EditorConfig: 0.243s total, mean: 0.001s, max: 0.003s
-    JsFormat.js_formatter: 0.039s total, mean: 0.000s, max: 0.002s
+    JsFormat.js_formatter: 0.040s total, mean: 0.000s, max: 0.002s
     Pretty JSON.PrettyJsonListeners: 0.010s total, mean: 0.000s, max: 0.001s
 
 on_query_completions:
-    All Autocomplete.all_views_completions: 10.517s total, mean: 0.002s, max: 0.061s
+    All Autocomplete.all_views_completions: 11.039s total, mean: 0.002s, max: 0.061s
-    CSS.css_completions: 0.233s total, mean: 0.000s, max: 0.006s
+    CSS.css_completions: 0.247s total, mean: 0.000s, max: 0.006s
-    HTML.html_completions: 0.095s total, mean: 0.000s, max: 0.002s
+    HTML.html_completions: 0.103s total, mean: 0.000s, max: 0.002s
     LSP.plugin.documents: 0.036s total, mean: 0.000s, max: 0.002s
 
 on_query_context:
-    Default.block: 0.158s total, mean: 0.000s, max: 0.015s
+    Default.block: 0.175s total, mean: 0.000s, max: 0.015s
-    LSP.plugin.documents: 0.069s total, mean: 0.000s, max: 0.002s
+    LSP.plugin.documents: 0.072s total, mean: 0.000s, max: 0.002s
 
 on_selection_modified:
-    Default.history_list: 9.490s total, mean: 0.000s, max: 0.133s
+    Default.history_list: 9.941s total, mean: 0.000s, max: 0.133s
 
 on_text_changed:
-    LSP.plugin.documents: 2.272s total, mean: 0.000s, max: 0.035s
+    LSP.plugin.documents: 2.396s total, mean: 0.000s, max: 0.035s
 
 on_text_command:
-    Default.history_list: 2.706s total, mean: 0.000s, max: 0.011s
+    Default.history_list: 2.854s total, mean: 0.000s, max: 0.011s
-    LSP.plugin.documents: 0.112s total, mean: 0.000s, max: 0.002s
+    LSP.plugin.documents: 0.123s total, mean: 0.000s, max: 0.002s
-    SublimeLinter.highlight_view: 0.086s total, mean: 0.000s, max: 0.002s
+    SublimeLinter.highlight_view: 0.093s total, mean: 0.000s, max: 0.002s
 
 on_window_command:
-    Default.history_list: 3.588s total, mean: 0.000s, max: 0.165s
+    Default.history_list: 3.789s total, mean: 0.000s, max: 0.165s

Since it took a while of running Sublime before I noticed a big stall, feels like a garbage collection stall or something like that - I'll report back if the stalls keep getting longer over the next couple days, I guess =).

Looking at the running processes, plugin_host-3.8.exe is taking 8.7GB of memory, which seems odd, so, that also tracks - perhaps a memory leak (and related garbage collection stalls, or even page file swapping).

Another day, another big stall... 22 seconds this time, plugin_host-3.8.exe is up to 10.5GB. Again no significant change in Profile Events (a few seconds total since dumping it at about the same time yesterday, nothing near the 22s long stall). Any thing else I can do to help / profile / provide additional information? If not, I'll probably roll back to v4169 again.

Good thing it's the 3.8 process, since that's got more profiling tools.

Memory usage is easiest, if you open the console and enter import tracemalloc; tracemalloc.start() it should start profiling memory usage. Once it's gone up run the following in the console:

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('traceback')

# pick the biggest memory block
for i in range(0, 5):
    stat = top_stats[i]
    print("%s memory blocks: %.1f KiB" % (stat.count, stat.size / 1024))
    for line in stat.traceback.format():
        print(line)

For the CPU time we can try cProfile. Run import cProfile; pr = cProfile.Profile(); pr.enable(), ideally shortly before a stall. Then run pr.print_stats() to get the profiling results.

First stall of the day, 29s, memory usage has gone up from 10.8gb when I started tracing to 11.9gb now. Output of the above script is:

283577 memory blocks: 55290.1 KiB
  File "C:\Apps\Sublime Text 4\Data\Installed Packages\LSP.sublime-package\plugin/completion.py", line 68
378494 memory blocks: 43077.5 KiB
  File "C:\Apps\Sublime Text 4\Data\Installed Packages\LSP.sublime-package\plugin/completion.py", line 105
580137 memory blocks: 36151.2 KiB
  File "./python3.8/json/decoder.py", line 353
378495 memory blocks: 32527.0 KiB
  File "C:\Apps\Sublime Text 4\Data\Installed Packages\LSP.sublime-package\plugin/completion.py", line 101
94917 memory blocks: 25603.5 KiB
  File "C:\Apps\Sublime Text 4\Data\Installed Packages\LSP.sublime-package\plugin/completion.py", line 108
182 memory blocks: 3334.4 KiB
  File "C:\Apps\Sublime Text 4\Lib\python38\sublime_plugin.py", line 1042
    self.completions += [c if isinstance(c, sublime.CompletionItem) else normalise_completion(c)

Nothing too huge, though quite possibly a leak (and I'd be happy to file a bug with LSP about it after we figure out why only the new Sublime is severely impacted by such a leak). Stumbling with Python, I wrote this to try to get the total:

total = 0
for i in range(0, len(top_stats)):
  stat = top_stats[i]
  total += stat.size / 1024
print("%.1f KiB" % total)

Assuming it's correct, prints a total of 198943.7 KiB, or 200MB, which is only about 20% of the 1.1GB that was gained while tracing was active. (No idea if that's just normal for Python's memory tracing, or if that's indicating the leak is in non-traced memory or something like Buffers on Node.js are handled ^_^).

Unfortunately doing a CPU profile seems unlikely to catch it in a timely fashion (happens only once every couple hours, although as the memory usage gets higher it gets more frequent, it seems ^_^) - I'll see if I can capture an hour with a stall and an hour without or something and see if that helps.

If my hunch that it's related to a garbage collection stall is correct (or, any other stall related to the high memory usage), then the big question is why this only affects the new version of Sublime - either the memory isn't leaking in the old version (should be easy to prove/disprove by rolling back and checking memory usage after a while), or the plugin_host is not stalling during garbage collects on the old version, or the old Sublime front-end was not hanging when the plugin_host was stalling.

Another morning, another stall... stall was around 52 seconds. I had pr.enable() running and did a pr.print_stats() after the stall, as well as another one an hour later after non-stalling normal use. Full data from these print_stats here.

The summary is:
After about an hour of use and a 52 second stall, it displayed:

12252672 function calls (12193578 primitive calls) in 23.574 seconds

After about an hour of (probably heavier) use and no stall, it displayed:

15817738 function calls (15737039 primitive calls) in 32.549 seconds

Definitely seems like the 52 seconds of max'd out CPU usage is not included in the profiler results (which, just making guesses from my experience with other garbage collected language's profilers, as they appear to include only user code and not system things like the garbage collector, seems to again point towards a large GC event/stall) - is there anything like the --trace-gc on Node.js/V8 that profiles just how long it spends in garbage collection? If that even makes sense for Python, apologies as I might be making totally off-base assumptions =).

typescript language server is known to provide insane amount of completions which LSP package doesn't cut or limit by any means. Each completion needs to be wrapped into a CompletionItem object and maybe multiple pending completion queries are collected by python API functions and transfered to ST core. UI being blocked is not ideal, but even if not, it would still cause heavy CPU/RAM usage and significant delays for completions to be displayed. That's primarily caused by pythons poor processing power.

With the information provided this issue is basically a duplicate of #6249.

Though that might be true, this does not cause Sublime v4169 to stall the UI, and it does cause the new Sublime v4180 to stall the UI, that seems like a major regression/issue. Having completions delayed is expected / the usual / what I expect from LSP, though mildly inconvenient, having my keystrokes blocked for nearly a minute is not expected.

I took a look at #6249 and tried it here, and the behavior seems to distinctly different - it's slow on both v4169 and v4180 equally. Though definitely could be related, they appear to be different bugs. #6249 seems caused by number of completions active in the moment of doing completions, this issue I'm seeing appears to rear its head at any time (the latest stall I was simply navigating the cursor around, no (visible) completions going on) and seems correlated to high memory use in the plugin_host process.

I've performed some more experiments to help narrow this down, all seem to confirm it's a change in how how v4180 behaves when under high-memory situations (presumably now garbage collection stalls are happening where they impact the front-end, or something similar).

I quickly threw together a plugin which simply leaks a bunch of memory. If I start up a new instance of Sublime v4180, run this to leak ~16Gb of memory (about where I was seeing these stalls before), the long (30+ second) stalls start happening "immediately" (though still just once every couple hours or so) instead of only starting after running Sublime for 3-4 days. This seems to confirm it's maybe not directly related to large number of completions (as maybe does the earlier results about the stall happening on cursor movement and not completion listing).

If I completely remove LSP, and leak lots of memory, still on v4180, I did not get any stalls during a few days. This seems likely because I'm left with basically no plugins doing any memory traffic. This seems to support the theory that the stalls happen only when a plugin is generating a bunch of memory traffic.

If I roll back to v4169, I confirmed that LSP is still using just as much memory (plugin_host went up by ~8GB in a few days), so the bug in v4180 is not that memory is being leaked, but a change in how Sublime is being impacted by a high-memory situation in plugin_host (presumably during a GC stall).

If I roll back to v4169, and run my same memory-leak plugin to leak ~16GB of memory, with all plugins including LSP enabled, everything has worked smoothly without a single stall for days, just as it has for the past many years of using Sublime.

After testing on v4180 (with exaggerated high memory) for a week, rolling back to v4169 (even with the same exaggerated high memory) just feels like night-and-day difference, it's amazing how incredibly frustrating and detrimental getting minute-long stall when in the middle of writing a line of code can be -_-.

Allocating a huge list in python and running the gc causes a stall on my end regardless of whether I'm using 4169 or 4183, though certainly not minute-long.

We can confirm/rule out the gc by enabling debugging for the python gc. In the console run: import gc; gc.set_debug(gc.DEBUG_STATS).