Excessive memory usage in monorepos with dozens/hundreds of cabal packages or modules

There are several issues about memory consumption but it seems it is exacerbated in those contexts
The large public codebase where hls works reliably is ghc but it has few packages
Test performance of large monorepos is hard:
- They usually are private
- Setup a benchmark here would be pretty difficult
So we need the feedback of users to improve perfomance in those scenarios
Pinging the experts om the subject: @wz1000, @mpickering, @pepeiborra, @fendor
I think @parsonsmatt has experienced this issue
As a first step we should analyze where is the culprit, starting from hie-bios

@jneira thanks for the summary. I don't have time to look into this but can share a few pointers:

we have a benchmark example for this scenario: HLS itself. It's currently disabled due to slowness, can someone look into reenabling it?

haskell-language-server/ghcide/bench/config.yaml

Lines 30 to 40 in a800b9d

    
           # Small but heavily multi-component example 
        
           # Disabled as it is far to slow. hie-bios >0.7.2 should help 
        
           # - name: HLS 
        
           #   path: bench/example/HLS 
        
           #   modules: 
        
           #       - hls-plugin-api/src/Ide/Plugin/Config.hs 
        
           #       - ghcide/src/Development/IDE/Plugin/CodeAction/ExactPrint.hs 
        
           #       - ghcide/bench/hist/Main.hs 
        
           #       - ghcide/bench/lib/Experiments/Types.hs 
        
           #       - ghcide/test/exe/Main.hs 
        
           #       - exe/Plugins.hs

once we have a working benchmark, it should be quite easy to characterise the scenarios that leak memory using the array of experiments available. The benchmark suite produces heap graphs and can also produce heap profiles
the most likely culprit is GHC. The Session code creates a cascade of GHC Sessions to support multi-component scenarios, and it's likely that we are holding onto too much data (and leaking some more). There are two paths forward then:
- thin out the stuff we keep alive to reduce the overall footprint,
- identify leaks in GHC and fix them upstream. Since it is possible they are already fixed in HEAD, this step should be performed using GHC 9.2 or newer

Good luck and happy hacking!

Many thanks for your insights, for completeness the pr about enabling the hls becnhmark example is #1420

I'm having access to one private monorepo with ~250 "packages" and more than 1000 modules which exhibits huge memory usage (we upgraded dev machines to 32 GB of RAM in order to use HLS).

I still have good contact with another company with a similar monorepo.

Both are using bazel as the build system and the flags are generated from bazel. Note that there is one "component" (i.e. on cradle: ... shell: call per module).

I can run every benchmarking you may suggest on this repo.

The Haskell codebase at Mercury is 232kloc in a single cabal package with 1,600+ modules currently. Build system is cabal with nix providing all dependencies.

A large number of packages probably contributes to slowdown, but even with a single package target, HLS is unusably slow for most of our developers, even those with 32GB of RAM.

HLS is unusably slow for most of our developers, even those with 32GB of RAM.

I think some plugins contribute to the slowness, did they try to only use certain (base) plugins?

HLS is unusably slow for most of our developers, even those with 32GB of RAM.

I think some plugins contribute to the slowness, did they try to only use certain (base) plugins?

How do I disable plugins ? I.e. starting with the minimum and progressively increasing them? Is there a runtime option or should I just rebuild HLS without them?

I am not sure whether there exists a run-time option (there definitely should be), I built from source using cabal flags, and modified the constraints in cabal.project

The Haskell codebase at Mercury is 232kloc in a single cabal package with 1,600+ modules currently

in how many components (libs, test suites, benchmarks, execs) are distributed those modules?

To disable plugins at runtime you could use the lsp client provided json config, i will try to post a config example which disables all plugins.

The Haskell codebase at Mercury is 232kloc in a single cabal package with 1,600+ modules currently. Build system is cabal with nix providing all dependencies.

A large number of packages probably contributes to slowdown, but even with a single package target, HLS is unusably slow for most of our developers, even those with 32GB of RAM.

Try this branch - #2060

in how many components (libs, test suites, benchmarks, execs) are distributed those modules?

I cannot comment for the mercury codebase, but for my codebase there is no "composent". I'm using a shell custom command in my hie.yaml file, and the output of the shell command will be different for each of the ~1500 modules of my codebase.

Is reducing this output size AND limiting how they are different a possible way to reduce memory usage / invalidation?

in how many components (libs, test suites, benchmarks, execs) are distributed those modules?

I cannot comment for the mercury codebase, but for my codebase there is no "composent". I'm using a shell custom command in my hie.yaml file, and the output of the shell command will be different for each of the ~1500 modules of my codebase.

This is a worst-case scenario. Best case would be a custom cradle that returns a universal set of flags and then lists all the module paths.

in how many components (libs, test suites, benchmarks, execs) are distributed those modules?

We have a single component that we want to load, test-dev, that has all test modules and library modules. It is only used for ghcid and HLS - normal builds and tests use a standard library and test component.

To disable plugins at runtime you could use the lsp client provided json config, i will try to post a config example which disables all plugins.

ok, if you run haskell-language-server generate-default-config it will output the default json config:

default config

 {
    "haskell": {
        "checkParents": "CheckOnSaveAndClose",
        "hlintOn": true,
        "formattingProvider": "ormolu",
        "diagnosticsOnChange": true,
        "diagnosticsDebounceDuration": 350000,
        "liquidOn": false,
        "plugin": {
            "importLens": {
                "codeActionsOn": true,
                "codeLensOn": true
            },
            "ghcide-hover-and-symbols": {
                "hoverOn": true,
                "symbolsOn": true
            },
            "ghcide-code-actions-bindings": {
                "globalOn": true
            },
            "splice": {
                "globalOn": true
            },
            "retrie": {
                "globalOn": true
            },
            "hlint": {
                "codeActionsOn": true,
                "diagnosticsOn": true,
                "config": {
                    "flags": []
                }
            },
            "haddockComments": {
                "globalOn": true
            },
            "eval": {
                "globalOn": true
            },
            "class": {
                "globalOn": true
            },
            "ghcide-completions": {
                "globalOn": true,
                "config": {
                    "autoExtendOn": true,
                    "snippetsOn": true
                }
            },
            "ghcide-code-actions-fill-holes": {
                "globalOn": true
            },
            "ghcide-type-lenses": {
                "globalOn": true,
                "config": {
                    "mode": "always"
                }
            },
            "tactics": {
                "hoverOn": true,
                "codeActionsOn": true,
                "config": {
                    "max_use_ctor_actions": 5,
                    "proofstate_styling": true,
                    "auto_gas": 4,
                    "timeout_duration": 2,
                    "hole_severity": null
                },
                "codeLensOn": true
            },
            "pragmas": {
                "codeActionsOn": true,
                "completionOn": true
            },
            "ghcide-code-actions-type-signatures": {
                "globalOn": true
            },
            "refineImports": {
                "codeActionsOn": true,
                "codeLensOn": true
            },
            "moduleName": {
                "globalOn": true
            },
            "ghcide-code-actions-imports-exports": {
                "globalOn": true
            }
        },
        "formatOnImportOn": true,
        "checkProject": true,
        "maxCompletions": 40
    }
}

As you can check, inside plugins all flags are true so all plugins are enabled. All of them supports globalOn: false but we only list it if the plugin has one capability.
So to disable all plugins you could use:

 {
    "haskell": {
         "plugin": {
            "importLens": {
                "globalOn": false
            },
            "ghcide-hover-and-symbols": {
                "globalOn": false
            },
            "ghcide-code-actions-bindings": {
                "globalOn": false
            },
            "splice": {
                "globalOn": false
            },
            "retrie": {
                "globalOn": false
            },
            "hlint": {
               "globalOn": false
            },
            "haddockComments": {
                "globalOn": false
            },
            "eval": {
                "globalOn": false
            },
            "class": {
                "globalOn": false
            },
            "ghcide-completions": {
                "globalOn": false
            },
            "ghcide-code-actions-fill-holes": {
                "globalOn": false
            },
            "ghcide-type-lenses": {
                "globalOn": false
            },
            "tactics": {
                "globalOn": false
            },
            "pragmas": {
                "globalOn": false
            },
            "ghcide-code-actions-type-signatures": {
                "globalOn": false
            },
            "refineImports": {
                "globalOn": false
            },
            "moduleName": {
                "globalOn": false
            },
            "ghcide-code-actions-imports-exports": {
                "globalOn": false
            }
        }
}

You can put that json in projectRoot/.vscode/settings.json to disable them for a project if you are using vscode (other editors have their own way to give the config to the server)

CircuitHub uses HLS (well, I do, at least!) over its 1760 modules split over 50 .cabal files. I actually find it usable, but did need to upgrade to 32GB RAM. I'm happy to provide more data if you let me know what would be useful!

Some initial observations on the same codebase @parsonsmatt is talking about. We're working on improving HLS's performance on this codebase.

The (private) codebase has about 1401 modules in one single package.

Configuration:

I have disabled all plugins for HLS via stack build --flag haskell-language-server:-all-formatters --flag haskell-language-server:-all-plugins --flag haskell-language-server:-brittany --flag haskell-language-server:-callhierarchy --flag haskell-language-server:-class --flag haskell-language-server:-eval --flag haskell-language-server:-floskell --flag haskell-language-server:-fourmolu --flag haskell-language-server:-haddockcomments --flag haskell-language-server:-hlint --flag haskell-language-server:-importlens --flag haskell-language-server:-modulename --flag haskell-language-server:-ormolu --flag haskell-language-server:-pedantic --flag haskell-language-server:-pragmas --flag haskell-language-server:-refineimports --flag haskell-language-server:-rename --flag haskell-language-server:-retrie --flag haskell-language-server:-splice --flag haskell-language-server:-stylishhaskell --flag haskell-language-server:-tactic --copy-bins
shake built with -O0 (I don't think this should matter much)

HLS uses for stack the stack ghci for generating .hie files with args like "-fwrite-ide-info","-hiedir",".hiefiles....

Here is -s output when I hardcode HLS to self-terminate before it begins loading .hie files, with 2.6GB of total memory use or 678MB of maximum residency (56GB total allocations).

56,391,925,584 bytes allocated in the heap
    4,541,399,808 bytes copied during GC
      678,908,224 bytes maximum residency (7 sample(s))
        7,761,600 bytes maximum slop
             2626 MiB total memory in use (0 MB lost due to fragmentation)
 
                                      Tot time (elapsed)  Avg pause  Max pause
   Gen  0        47 colls,    47 par    9.485s   1.318s     0.0280s    0.0475s
   Gen  1         7 colls,     6 par    4.927s   1.324s     0.1892s    0.7458s
 
   Parallel GC work balance: 93.47% (serial 0%, perfect 100%)
 
   TASKS: 35 (1 bound, 34 peak workers (34 total), using -N8)
 
   SPARKS: 21 (20 converted, 0 overflowed, 0 dud, 0 GC'd, 1 fizzled)
 
   INIT    time    0.002s  (  0.002s elapsed)
   MUT     time   60.577s  ( 58.391s elapsed)
   GC      time   14.412s  (  2.642s elapsed)
   EXIT    time    0.005s  (  0.006s elapsed)
   Total   time   74.996s  ( 61.040s elapsed)
 
   Alloc rate    930,911,213 bytes per MUT second
 
   Productivity  80.8% of total user, 95.7% of total elapsed

Meanwhile, if I allow HLS to proceed with loading all the .hie files, then we have 6GB total memory use with a maximum residency of 2GB (and 251GB total allocations).

251,921,600,112 bytes allocated in the heap
   21,768,183,432 bytes copied during GC
    2,288,714,288 bytes maximum residency (11 sample(s))
       16,694,000 bytes maximum slop
             6218 MiB total memory in use (0 MB lost due to fragmentation)
 
                                      Tot time (elapsed)  Avg pause  Max pause
   Gen  0       296 colls,   296 par   53.105s   6.924s     0.0234s    0.1068s
   Gen  1        11 colls,    10 par   17.618s   4.554s     0.4140s    2.1406s
 
   Parallel GC work balance: 84.73% (serial 0%, perfect 100%)
 
   TASKS: 41 (1 bound, 40 peak workers (40 total), using -N8)
 
   SPARKS: 4711 (4676 converted, 0 overflowed, 0 dud, 0 GC'd, 35 fizzled)
 
2021-09-16 21:34:46.35954889 UTC stderr
   INIT    time    0.002s  (  0.002s elapsed)
   MUT     time  391.014s  (224.797s elapsed)
   GC      time   70.723s  ( 11.478s elapsed)
   EXIT    time    0.007s  (  0.003s elapsed)
   Total   time  461.746s  (236.280s elapsed)
 
   Alloc rate    644,276,909 bytes per MUT second
 
   Productivity  84.7% of total user, 95.1% of total elapsed

That's this many .hie files:

$ find /home/chris/.cache/ghcide/main-14b8ce32153e652c2e5dfdbd46a19ccdbb440779 -name '*.hie' | wc -l
		1392

It's hardly surprising, as the HieFile type has a lot of data in it.

So my initial impressions are:

Simply loading up the codebase files causes a lot of allocations and residency is high, even before we load .hie files or think about enabling plugins. Ideally we could shrink that down and start letting go of unneeded things but it may not be the main burden.
Once we load up .hie files, that takes up 3x more memory than even the first step. The HieFile contents is large, and bringing them all into memory is naturally a blow up area. Avoiding loading them up, or loading them and discarding them for later with an IO action that loads them on demand might be an option.

I'll follow up with numbers when plugins are enabled.

Here are the same numbers with default plugins enabled without any flags passed. The maximum residency doubles to 4GB with total memory use jumping to an almost double 9.7GB.

298,006,080,712 bytes allocated in the heap
   24,459,460,216 bytes copied during GC
    4,421,180,736 bytes maximum residency (11 sample(s))
       24,941,248 bytes maximum slop
             9744 MiB total memory in use (0 MB lost due to fragmentation)
 
                                      Tot time (elapsed)  Avg pause  Max pause
   Gen  0       309 colls,   309 par   66.564s   8.939s     0.0289s    0.1255s
   Gen  1        11 colls,    10 par   18.143s   8.081s     0.7347s    5.2873s
 
   Parallel GC work balance: 86.99% (serial 0%, perfect 100%)
 
   TASKS: 42 (1 bound, 41 peak workers (41 total), using -N8)
 
   SPARKS: 6737 (6649 converted, 0 overflowed, 0 dud, 0 GC'd, 88 fizzled)
 
   INIT    time    0.002s  (  0.002s elapsed)
   MUT     time  421.495s  (242.495s elapsed)
   GC      time   84.707s  ( 17.020s elapsed)
   EXIT    time    5.252s  (  0.003s elapsed)

   Total   time  511.457s  (259.520s elapsed)
 
   Alloc rate    707,021,073 bytes per MUT second
 
   Productivity  82.4% of total user, 93.4% of total elapsed

Therefore the summary is: the initial loading has very high memory use, the loading of the .hie files has very high memory use, and the plugins have very high memory use.

As a small point of interest, the memory use before we consult the cradle in ghcide leads us already to 4GB maximum residency and 1GB total use.

2021-09-16 22:46:58.56733376 UTC stderr> 2021-09-16 23:46:58.567153867 [ThreadId 108] INFO hls:	Consulting the cradle for "app/main.hs"
 Killing self before calling loadCradle.
2021-09-16 22:46:58.574672845 UTC stderr> Received SIGTERM. Killing main thread.
2021-09-16 22:46:58.575577644 UTC stderr> Output from setting up the cradle Cradle {cradleRootDir = "/home/chris/Work/mercury/web-backend", cradleOptsProg = CradleAction: Stack}
2021-09-16 22:46:58.575869799 UTC stderr> haskell-language-server: thread killed
2021-09-16 22:46:58.602032681 UTC stderr>      
      157,480,464 bytes allocated in the heap
        1,445,800 bytes copied during GC
        4,188,152 bytes maximum residency (1 sample(s))
          178,184 bytes maximum slop
             1045 MiB total memory in use (0 MB lost due to fragmentation)
 
                                      Tot time (elapsed)  Avg pause  Max pause
   Gen  0         0 colls,     0 par    0.000s   0.000s     0.0000s    0.0000s
   Gen  1         1 colls,     0 par    0.019s   0.019s     0.0187s    0.0187s
 
   TASKS: 31 (1 bound, 27 peak workers (30 total), using -N8)
 
   SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
 
   INIT    time    0.003s  (  0.003s elapsed)
   MUT     time    0.208s  (  0.792s elapsed)
   GC      time    0.019s  (  0.019s elapsed)
   EXIT    time    0.005s  (  0.007s elapsed)
   Total   time    0.235s  (  0.820s elapsed)
 
   Alloc rate    756,162,690 bytes per MUT second
 
   Productivity  88.8% of total user, 96.5% of total elapsed

At this point we have some easy areas to investigate.

I'm likely to proceed with heap/allocation profiling tools from here to identify in each stage if there are any low-hanging fruit.

@chrisdone wow, impressive analysis, many thanks

The HieFile contents is large, and bringing them all into memory is naturally a blow up area. Avoiding loading them up, or loading them and discarding them for later with an IO action that loads them on demand might be an option.

That seems to be a good path to get an improvement without too much changes. Ping @wz1000 as they introduced hiedb, which seems the correct tool to accomplish it (or it will be involved for sure)

Here are the same numbers with default plugins enabled without any flags passed. The maximum residency doubles to 4GB with total memory use jumping to an almost double 9.7GB.

That hurts, in that area we would need to analyze plugins separately, to identify what memory usage is shared for all of them and what make they increase it. For example is expected that the eval plugin will take much more than others, as it spans new ghc sessions instead reuse the created by ghcide.

Some initial observations on the same codebase @parsonsmatt is talking about. We're working on improving HLS's performance on this codebase.

Awesome! I am happy to help, ping me on #haskell-language-server at any time.

shake built with -O0 (I don't think this should matter much)

Politely WTF?

Meanwhile, if I allow HLS to proceed with loading all the .hie files, then we have 6GB total memory use with a maximum residency of 2GB (and 251GB total allocations).

#1946 might be relevant

It's hardly surprising, as the HieFile type has a lot of data in it.

So my initial impressions are:

Simply loading up the codebase files causes a lot of allocations and residency is high, even before we load .hie files or think about enabling plugins. Ideally we could shrink that down and start letting go of unneeded things but it may not be the main burden.

Keep in mind that HLS type checks the entire codebase at startup by default. You probably want to disable this for your 1000s files codebase with haskell.checkProject = False and that should prevent it from loading all the .hie files at startup.

Once we load up .hie files, that takes up 3x more memory than even the first step. The HieFile contents is large, and bringing them all into memory is naturally a blow up area. Avoiding loading them up, or loading them and discarding them for later with an IO action that loads them on demand might be an option.

Sadly ghcide does not have garbage collection of build results.

If you plan to do further analysis, I suggest that you use the #2060 branch
Also, you may want to use ghcide-bench to automate the repro: https://github.com/haskell/haskell-language-server/blob/master/ghcide/bench/exe/Main.hs

The Haskell codebase at Mercury is 232kloc in a single cabal package with 1,600+ modules currently. Build system is cabal with nix providing all dependencies.

A large number of packages probably contributes to slowdown, but even with a single package target, HLS is unusably slow for most of our developers, even those with 32GB of RAM.

How much TH is used in this codebase?

Sadly ghcide does not have garbage collection of build results.

We have 2 issues about garbage collector:

#713: clean diganostics when closing a file
#792 less related wih this, as delete modules is less frequent

How much TH is used in this codebase?

The codebase heavily uses TH and simple rg '\$\(' --type haskell -l src | wc -l gives me 815 modules which is over a half of all the modules in the project

You probably want to disable this for your 1000s files codebase with haskell.checkProject = False and that should prevent it from loading all the .hie files at startup.

That important config option was undocumented, i've added it to docs with #2203
It is not included in the actual vscode extension (although you can use the missing options writing a custom json config), issue to track their addition: haskell/vscode-haskell#459

I think .hie files should be retained in heap/shake graph only for "files of interest" (files that are currently open in the editor). They are loaded at startup for indexing, but once they are indexed they should not be in memory anymore.

If this is not the case, it is a bug that shouldn't be very hard to fix.

Also, you may want to use ghcide-bench to automate the repro: https://github.com/haskell/haskell-language-server/blob/master/ghcide/bench/exe/Main.hs

@pepeiborra I've spent some hours trying to make it work but couldn't so I filed #2220 describing the errors I saw.

Did you check the Haddock comment on the linked module header? What command line did you use to run ghcide-bench? I am specifically suggesting to use ghcide-bench to collect profiling data, not the entire benchmark suite.

Example command line to use off the top of my head:

cd ghcide
cabal exec cabal run ghcide-bench -- -- -s edit --example-path <path-to-your-repo> --example-module <path-to-a-module-from-repo-root> --samples 100 --ghcide-options="+RTS -h -RTS"

Thanks, will take a look

shake built with -O0 (I don't think this should matter much)

Politely WTF?

It doesn't make a difference to the memory use, reproducing the same 3GB with -O enabled. Stands to reason, shake itself doesn't do much in the way of allocations. I was just having trouble making shake compile due to tick factors and didn't feel like messing with it, but setting it to 500 works anyway.

If you plan to do further analysis, I suggest that you use the #2060 branch

Thanks, we'll use that one!

I think .hie files should be retained in heap/shake graph only for "files of interest" (files that are currently open in the editor). They are loaded at startup for indexing, but once they are indexed they should not be in memory anymore.

We'll try to fix this too.

We're sure that not type-checking the whole project on startup will help initially, but once you start working with the project, it'll have to type check anyway. So it doesn't sound like a big win. But we'll test it anyway.

A small point of interest, I start adding checkpoints in the code. A snippet:

https://chrisdone.com/checkpoint-2021-09-27-17-19-minir-gc.txt

Eventually the numbers go up to 6GB max_mem_in_use_bytes and 213GB of allocated_bytes.

I confirmed via performMajorGC in each checkpoint that getOptions allocates GB of bytes. But it's not the bulk. The getting of mod summaries accumulates slowly. Then loading hie files accumulates enthusiastically.

You may want to print the NormalizedFilePath part of the key as well, to understand if we are leaking. So far it doesn't look like we are.

HieFile is expensive, carrying:

a ByteString with the full source, so HLS is effectively loading the whole repo in memory
a lot of FastStrings which live in a global table and are never collected
a not-so-compact representation of the parsed AST

It might be worth parsing some of its fields lazily, delaying allocations until actually needed. But that would probably require representational changes in GHC.

A simpler option would be to void some of those fields immediately after parsing, or replace them with a lazy thunk to reparse. In some cases, they could be replaced by an end product, e.g. replace the full source by its hash.

Thanks @pepeiborra.

We also noted that there isn't one specific file that's humongous causing excessive allocations, if you look at the log, the allocations steadily grow with each readHieFileFromDisk. However, that's not quite the place where the problem lies.

I did more narrowing and found that making the HIE files lazy didn't save anything substantial (in this case). Indeed, despite not actually loading the HIE files and deferring them later via a "LazyHie" type, the memory growth was the same. So for loading from disk, the HIE files isn't actually causing problems.

In fact, the culprit appears to be generation of hi files, here:

-      r <- loadInterface (hscEnv session) ms sourceModified linkableType (regenerateHiFile session f ms)
+      r <- loadInterface (hscEnv session) ms sourceModified linkableType (\_ -> pure ([] ,Nothing))

Above, replacing the regenerateHiFile with a no-op leaves us with a big difference at the end:

allocated_bytes               max_live_bytes                max_mem_in_use_bytes          gcdetails_allocated_bytes     
176,208,741,000               1,895,058,048                 6,328,156,160                 38,527,608

vs

allocated_bytes               max_live_bytes                max_mem_in_use_bytes          gcdetails_allocated_bytes     
23,524,853,232                392,746,744                   2,020,605,952                 1,044,424

(6.3GB to 2GB total memory use)

We don't think that it should be regenerating these files. If you have any insights into this, please let us know!

@qrilka has some evidence that the hashes for modules are instable, which may be a culprit. He's going to look more into this.

Interesting. I would look at the recompilation logic and verify that it's working as expected. TH makes a difference here:

haskell-language-server/ghcide/src/Development/IDE/Core/Compile.hs

Lines 907 to 924 in ec53fcb

    
           case res of 
        
                 (UpToDate, Just iface) 
        
                   -- If the module used TH splices when it was last 
        
                   -- compiled, then the recompilation check is not 
        
                   -- accurate enough (https://gitlab.haskell.org/ghc/ghc/-/issues/481) 
        
                   -- and we must ignore 
        
                   -- it.  However, if the module is stable (none of 
        
                   -- the modules it depends on, directly or 
        
                   -- indirectly, changed), then we *can* skip 
        
                   -- recompilation. This is why the SourceModified 
        
                   -- type contains SourceUnmodifiedAndStable, and 
        
                   -- it's pretty important: otherwise ghc --make 
        
                   -- would always recompile TH modules, even if 
        
                   -- nothing at all has changed. Stability is just 
        
                   -- the same check that make is doing for us in 
        
                   -- one-shot mode. 
        
                   | not (mi_used_th iface) || SourceUnmodifiedAndStable == sourceMod 
        
                   -> do

Our bench harness runs:

_ <- try (removeDirectoryRecursive ".stack-work") :: IO (Either IOException ())
_ <- try (removeDirectoryRecursive (home ++ "/.cache/ghcide/")) :: IO (Either IOException ())

In order to clear the cache. Are there any other cache directories I should be aware of?

I'm trying not to hit recompilation case at all and just do a cold run.

(Our clients aren't able to finish the first cold run, thereby making any recompilation logic a secondary concern.)

It looks like "Impure plugin forced recompilation" is the reason regeneration gets triggered in the project we are looking into.

Our bench harness runs:
_ <- try (removeDirectoryRecursive ".stack-work") :: IO (Either IOException ())
_ <- try (removeDirectoryRecursive (home ++ "/.cache/ghcide/")) :: IO (Either IOException ())
In order to clear the cache. Are there any other cache directories I should be aware of?

I'm trying not to hit recompilation case at all and just do a cold run.

(Our clients aren't able to finish the first cold run, thereby making any recompilation logic a secondary concern.)

A cold run is expensive and should be avoided - we haven't done any work to optimise this scenario. For Sigma, I manually populate the ghcide cache by fetching all the hi and hie files from the cloud build system.

It looks like "Impure plugin forced recompilation" is the reason regeneration gets triggered in the project we are looking into.

that probably refers to a GHC plugin. Do you use any of those?

I don't see anything other than what's shipped with GHC itself. I the project cabal file I see

  ghc-options: -Weverything -Wno-missing-exported-signatures -Wno-missing-export-lists -Wno-missing-import-lists -Wno-missed-specialisations -Wno-all-missed-specialisations -Wno-unsafe -Wno-missing-local-signatures -Wno-monomorphism-restriction -Wno-missing-safe-haskell-mode -Wno-prepositive-qualified-module -Wno-unused-packages -fdiagnostics-color=always +RTS -A128m -n2m -RTS -fwrite-ide-info -hiedir .hiefiles

and tracing shows empty properties cachedPlugins and cachedPlugins of GHC session.
The only thing which raises my suspicion is https://hackage.haskell.org/package/persistent-discover used in the process. But does GHC assumes preproccesor to be a plugin?

git clone ghc
cd ghc
grep "impure plugin forced recompilation"

I have implemented a form of garbage collection in #2263. Does it help at all?

I don't see anything other than what's shipped with GHC itself. I the project cabal file I see
  ghc-options: -Weverything -Wno-missing-exported-signatures -Wno-missing-export-lists -Wno-missing-import-lists -Wno-missed-specialisations -Wno-all-missed-specialisations -Wno-unsafe -Wno-missing-local-signatures -Wno-monomorphism-restriction -Wno-missing-safe-haskell-mode -Wno-prepositive-qualified-module -Wno-unused-packages -fdiagnostics-color=always +RTS -A128m -n2m -RTS -fwrite-ide-info -hiedir .hiefiles
and tracing shows empty properties cachedPlugins and cachedPlugins of GHC session. The only thing which raises my suspicion is https://hackage.haskell.org/package/persistent-discover used in the process. But does GHC assumes preproccesor to be a plugin?

It turns out that the Tactics HLS plugin installs a GHC plugin:

haskell-language-server/plugins/hls-tactics-plugin/src/Wingman/StaticPlugin.hs

Lines 17 to 34 in ec53fcb

    
           staticPlugin :: DynFlagsModifications 
        
           staticPlugin = mempty 
        
             { dynFlagsModifyGlobal = 
        
                 \df -> allowEmptyCaseButWithWarning 
        
                      $ flip gopt_unset Opt_SortBySubsumHoleFits 
        
                      $ flip gopt_unset Opt_ShowValidHoleFits 
        
                      $ df 
        
                        { refLevelHoleFits = Just 0 
        
                        , maxRefHoleFits   = Just 0 
        
                        , maxValidHoleFits = Just 0 
        
           #if __GLASGOW_HASKELL__ >= 808 
        
                        , staticPlugins = staticPlugins df <> [metaprogrammingPlugin] 
        
           #endif 
        
                        } 
        
           #if __GLASGOW_HASKELL__ >= 808 
        
             , dynFlagsModifyParser = enableQuasiQuotes 
        
           #endif 
        
             }

Apologies, I wasn't aware of this.

Oh, yes, we've found this too, sorry for not sharing this

@qrilka @chrisdone hi, not sure if you already had the chance of check all the perf improvements included in hls 1.5.0, could you give a try if that is not the case? thanks!

sorry but we would need some feedback to confirm this continue being a problem in your repos: @qrilka @parsonsmatt @guibou, thanks!

I don't have enough information on this currently but probably @parsonsmatt could give an answer.

I did a small session to experiment. 10 minutes of opening files, moving through different files, writing a few functions (and generating errors), applying code actions, moving back to files which are depending on the one I modified, running some code eval, RAM usage of HLS is ~10GiB. (Actually, that's 9GiB in one HLS, and 1 GiB in another, I have no idea why my session generated two instances of HLS...)

Most of the time I see a bit more after a longer editing session.

Sometimes, the memory usage skyrocket. Yesterday I had to kill HLS because it was eating all of my ram and even more, entries in the log:

λ narwal /tmp → cat hls_log.txt | grep '2[0-9]\{4\}\...MB'
2022-01-16 20:23:25.17165688 [ThreadId 5] INFO hls:	Live bytes: 5736.25MB Heap size: 21155.02MB
2022-01-16 21:55:33.00043412 [ThreadId 5] INFO hls:	Live bytes: 24883.55MB Heap size: 26281.51MB
2022-01-17 09:49:39.080170657 [ThreadId 5] INFO hls:	Live bytes: 26870.07MB Heap size: 28263.32MB
2022-01-18 19:06:22.571507896 [ThreadId 5] INFO hls:	Live bytes: 26814.11MB Heap size: 27943.50MB

That's yesterday log when I was doing some serious Haskell work

λ narwal /tmp → cat hls_log.txt | grep '[0-9]\{5\}\...MB' | grep '2022-01-31' 
2022-01-31 11:44:03.190732278 [ThreadId 5] INFO hls:	Live bytes: 5964.91MB Heap size: 10117.71MB
2022-01-31 11:45:03.20162854 [ThreadId 5] INFO hls:	Live bytes: 5160.43MB Heap size: 10117.71MB
2022-01-31 11:46:03.211219812 [ThreadId 5] INFO hls:	Live bytes: 3380.00MB Heap size: 10117.71MB
2022-01-31 11:47:03.255654266 [ThreadId 5] INFO hls:	Live bytes: 3380.00MB Heap size: 10117.71MB
2022-01-31 11:48:03.279647251 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 11:49:03.332638151 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 11:50:03.392449067 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 11:51:03.44704232 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 11:52:03.507582 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 11:53:03.56818916 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 11:54:03.628084608 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 11:55:03.688447284 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 11:56:03.749154017 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 11:57:03.809418273 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 11:58:03.847357296 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 11:59:03.907536535 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 12:00:03.967602383 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 12:01:04.027445238 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 12:02:04.087602669 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 12:03:04.147532977 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 12:04:04.207379731 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 12:05:04.267437502 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 12:06:04.295770902 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 12:07:04.355432014 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 12:08:04.415494469 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 12:09:04.475478804 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 12:10:04.500153057 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 12:11:04.537940724 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 12:12:04.597515693 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 12:13:04.627046201 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 12:14:04.654622893 [ThreadId 5] INFO hls:	Live bytes: 4671.24MB Heap size: 10117.71MB
2022-01-31 12:15:04.676529878 [ThreadId 5] INFO hls:	Live bytes: 3471.11MB Heap size: 10117.71MB
2022-01-31 12:16:04.677115665 [ThreadId 5] INFO hls:	Live bytes: 2816.22MB Heap size: 10117.71MB
2022-01-31 12:17:04.688766792 [ThreadId 5] INFO hls:	Live bytes: 4269.15MB Heap size: 10117.71MB
2022-01-31 12:18:04.738794819 [ThreadId 5] INFO hls:	Live bytes: 3964.57MB Heap size: 10117.71MB
2022-01-31 12:19:04.749195261 [ThreadId 5] INFO hls:	Live bytes: 5136.09MB Heap size: 10117.71MB
2022-01-31 12:20:04.757342525 [ThreadId 5] INFO hls:	Live bytes: 3430.69MB Heap size: 10252.98MB
2022-01-31 12:21:04.771154358 [ThreadId 5] INFO hls:	Live bytes: 4104.00MB Heap size: 10252.98MB
2022-01-31 12:22:04.774846562 [ThreadId 5] INFO hls:	Live bytes: 3643.31MB Heap size: 10252.98MB
2022-01-31 12:23:04.776276919 [ThreadId 5] INFO hls:	Live bytes: 4848.04MB Heap size: 11213.47MB
2022-01-31 12:24:04.784362273 [ThreadId 5] INFO hls:	Live bytes: 3727.95MB Heap size: 11127.49MB
2022-01-31 12:25:04.795950277 [ThreadId 5] INFO hls:	Live bytes: 3497.16MB Heap size: 11127.49MB
2022-01-31 12:26:04.856477827 [ThreadId 5] INFO hls:	Live bytes: 3497.16MB Heap size: 11127.49MB
2022-01-31 12:27:04.868532119 [ThreadId 5] INFO hls:	Live bytes: 6358.67MB Heap size: 11127.49MB
2022-01-31 12:28:04.901509457 [ThreadId 5] INFO hls:	Live bytes: 6358.67MB Heap size: 11127.49MB
2022-01-31 12:29:04.90941221 [ThreadId 5] INFO hls:	Live bytes: 4031.72MB Heap size: 11127.49MB
2022-01-31 12:30:04.921078785 [ThreadId 5] INFO hls:	Live bytes: 4249.79MB Heap size: 11127.49MB

So looks like around ~11 GiB. I'm surprised by the difference between live bytes and heap.

Note: In this code I'm using a commit from HLS from a few days ago, so it is not 1.6.1, but it is close.

So in summary:

~10 GiB in "normal" sessions. Is that considered "excessive", I don't know. (That's too much for the dev work with only 16 GiB of RAM on their machine, but it is fine for dev with 32 GiB).
There may be still be some memory leak hidden somewhere, because I observed yesterday 27 GiB of RAM usage.

Sorry, that's not a precise answer.

I'll soon switch our work codebase to GHC 9.2, most of the work is done, we are still blocked on a GHC regression related to doctest AND on HLS eval plugin. Once this will be done, I'll run my daily dev using HLS with the GHC 9.2 info table profiling / event log. Hopefully I'll be able to gather details about the origin of the allocations.

@guibou I fixed some issues on the GHC end which will end up in 9.2.2. Otherwise heap size being 2x of live bytes is normal with the copying collector - see https://well-typed.com/blog/2021/03/memory-return/

@mpickering I'm really looking forward for GHC 9.2.2 ;)

I've had another look at your article. I'm surprised by this assertion that 2x is normal, because:

There is a counter example in my message, Live bytes: 26814.11MB Heap size: 27943.50MB. But perhaps that's a special case (for example, a space leak)
I thought that the copying collector was splitting the allocated space in chunks which are garbage collected independently, hence at a point in time, we do not need to allocate twice more memory for the GC (we just need to allocate twice more memory than a chunk at a given time). I thought I remember reading about the block thing in https://simonmar.github.io/bib/parallel-gc-08_abstract.html (But, yes, that's a 12 years old paper, so maybe things changed a bit). Having a 2x memory overhoad looks like an important issue from my point of view, what have I missed?

Really wish I could second this as completed but HLS is still gobbling up all of my laptop's RAM (32gb) and then dying because it needs more when trying to use it on my work codebase. As of now (+ the "load serialized core on boot") it is still unusable for those of us with <64gb of RAM (those with 64GB of ram report it as "slow but usable")

sorry I closed it cause there were no feedback for a while and I was not sure if it was actionable anymore, gonna open it again

@parsonsmatt

Really wish I could second this as completed but HLS is still gobbling up all of my laptop's RAM (32gb) and then dying because it needs more when trying to use it on my work codebase. As of now (+ the "load serialized core on boot") it is still unusable for those of us with <64gb of RAM (those with 64GB of ram report it as "slow but usable")

FWIW, I’ve had good experience with using hiedb.

I'm not sure that having a generic ticket for "big projects use lots of memory" is actionable. Investigating memory usage and finding specific problems is useful, like #2962.

	# Small but heavily multi-component example
	# Disabled as it is far to slow. hie-bios >0.7.2 should help
	# - name: HLS
	# path: bench/example/HLS
	# modules:
	# - hls-plugin-api/src/Ide/Plugin/Config.hs
	# - ghcide/src/Development/IDE/Plugin/CodeAction/ExactPrint.hs
	# - ghcide/bench/hist/Main.hs
	# - ghcide/bench/lib/Experiments/Types.hs
	# - ghcide/test/exe/Main.hs
	# - exe/Plugins.hs

	case res of
	(UpToDate, Just iface)
	-- If the module used TH splices when it was last
	-- compiled, then the recompilation check is not
	-- accurate enough (https://gitlab.haskell.org/ghc/ghc/-/issues/481)
	-- and we must ignore
	-- it. However, if the module is stable (none of
	-- the modules it depends on, directly or
	-- indirectly, changed), then we can skip
	-- recompilation. This is why the SourceModified
	-- type contains SourceUnmodifiedAndStable, and
	-- it's pretty important: otherwise ghc --make
	-- would always recompile TH modules, even if
	-- nothing at all has changed. Stability is just
	-- the same check that make is doing for us in
	-- one-shot mode.
	\| not (mi_used_th iface) \|\| SourceUnmodifiedAndStable == sourceMod
	-> do

	staticPlugin :: DynFlagsModifications
	staticPlugin = mempty
	{ dynFlagsModifyGlobal =
	\df -> allowEmptyCaseButWithWarning
	$ flip gopt_unset Opt_SortBySubsumHoleFits
	$ flip gopt_unset Opt_ShowValidHoleFits
	$ df
	{ refLevelHoleFits = Just 0
	, maxRefHoleFits = Just 0
	, maxValidHoleFits = Just 0
	#if __GLASGOW_HASKELL__ >= 808
	, staticPlugins = staticPlugins df <> [metaprogrammingPlugin]
	#endif
	}
	#if __GLASGOW_HASKELL__ >= 808
	, dynFlagsModifyParser = enableQuasiQuotes
	#endif
	}