elbywan/crystalline

Memory usage

daliborfilus opened this issue · 15 comments

First of all, thank you for this project. It really helps in day to day crystalling :-)

I however experience very high RAM usage of the crystalline binary.
Using vs code + crystal extension, using crystalline instead of scry.
I have two projects opened, each one very small (< 5k lines of my code, around 34k lines of code including everything in lib/).
Every day, the memory usage is above 3 GB. Currently it's 6.9 GB RES.

Isn't it a little too much? Can be something done about it? Can I do something to help?

I experience this currently with downloaded binary from release v0.3.0 for linux, even though I've used custom builds in the past with the same issue.

Hey @daliborfilus,

Every day, the memory usage is above 3 GB. Currently it's 6.9 GB RES.
Isn't it a little too much? Can be something done about it? Can I do something to help?

This issue bothers me a lot, but I'm afraid there is nothing we can do about it.

Crystalline instantiate and use the crystal compiler in a separate thread, which allocates a bunch of memory every time it performs a static analysis of the project (depending on the code size - more than 1GB).

The high memory usage is caused by the fact that the garbage collector used by crystal struggles to cope with large allocations like this.

That's unfortunate. Btw I noticed now the vscode still highlights some typo as undefined method (fixed for more than 30 min) and crystalline process still had 6.9 GB resident usage and process time was still the same - maybe the extension got stuck on something. The memory seems to never go down, even for hours, so the GC maybe seems to think the memory is still referenced from somewhere...? When I killed the process, vscode re-launched it and the error went away, so I don't know if it was vscode who was stuck, or crystalline who got overwhelmed.

Anyway, I'll keep watching the process and "refresh it" from some time to time :-)

I'll close the issue because you cannot directly control/fix this.

For the record the GC behaviour can be checked by running the following code that compiles itself. (no crystalline involved)

Heap memory should grow very high (bdwgc allocates large amounts of virtual memory and does not return it to the os - depending on the compilation flags), and real memory should stabilize at some point around 6GB to 8GB (at least it does on my computer).

require "benchmark"
require "compiler/crystal/**"

def code_analysis(sources : Array(Crystal::Compiler::Source))
  dev_null = File.open(File::NULL, "w")
  compiler = Crystal::Compiler.new
  compiler.no_codegen = true
  compiler.color = false
  compiler.no_cleanup = true
  compiler.wants_doc = true
  compiler.stdout = dev_null
  compiler.stderr = dev_null
  result = compiler.compile(sources, "")

  raise result if result.is_a? Exception

  result
ensure
  dev_null.try &.close
end

file = File.new(__FILE__)
sources = [
  Crystal::Compiler::Source.new(file.path, file.gets_to_end),
]
file.close

# Compiles itself over and over
i = 0
loop do
  puts "iteration: #{i += 1}"
  puts Benchmark.measure {
    result = code_analysis(sources)
  }
  puts GC.prof_stats
end

And to check whether the memory is freed when the program is not doing anything: (hint: it isn't 😉)

# Update the program above with these loops instead of the infinite compilation:
i = 0
result = nil

# Compiles itself 10 times
loop do
  break if i >= 10
  puts "iteration: #{i += 1}"
  puts Benchmark.measure {
    result = code_analysis(sources)
  }
  puts GC.prof_stats
end

# Force GC every 5 seconds while doing nothing
loop do
  sleep 5.seconds
  GC.collect
  puts GC.prof_stats
end

Is this GC behavior reported to the Crystal devs?

@petr-fischer I don't think that they care about this, this is standard GC behaviour and they know that the compiler can take multiple GB of memory when compiling programs. In general the crystal compiler is killed after completion since it is a one-shot process, and so this does not involve the GC.

Recently crystalline started to use more than 8 GB of RAM for me, just after few minutes. I couldn't do anything and had to wait for OOM to kick in several times after all of my 32 gigs of RAM were consumed. I tried to endure it and kill crystalline in the background every ten minutes... but I gave up. I had to disable crystalline for my projects because of this. It's very unfortunate.
I don't know if it started to be such a problem for me because of any specific reason (crystal update, app libs update, heavy use of macros or something similar), or if it started just because I started to use vscode+crystalline far more often in that period of time.
The memory usage was always in gigabytes for me before, so it isn't anything new however.

@elbywan What about to run crystaline "compilations" in a temporary (one-time) external processes (outside of the main crystaline process)? Is it possible?

@petr-fischer Not possible, the compiler output (ast/types…) is not serializable.

have you tried starting crystalline with the env var GC_UNMAP_THRESHOLD=1 set? that will tell boehm to aggressively release memory. https://github.com/ivmai/bdwgc/blob/master/doc/README.environment#L150-L153

@carlhoerberg It definitely seems better. Within 5 minutes of quick testing I get to just under 4GB anyway, but it looks like RAM usage doesn't go up as aggressively as before. Thanks so much for this tip. Where is the best place to put the variable? Make a custom script for crystaline to set it up? Some better place?

Maybe @elbywan can incorporate some GC_* ENV vars tuning into the solution? This would certainly be good for all users...

@petr-fischer @carlhoerberg I already extensively tried a while ago different combinations of GC related environment variables and flags including GC_UNMAP_THRESHOLD and did not observe any notable improvements unfortunately.

@petr-fischer @carlhoerberg I already extensively tried a while ago different combinations of GC related environment variables and flags including GC_UNMAP_THRESHOLD and did not observe any notable improvements unfortunately.

@elbywan I’ve been swapping in Hoard for malloc in apps and have noticed a lot of improvements to allocation behavior (mostly lower fragmentation), but have yet to test it on anything with crazy high mem usage or loads. It’s made benchmarks speed up quite a bit for others, but no serious testing yet.

Any idea if it might help with this?

https://github.com/the-business-factory/hirobot.app/blob/the-hoard/Dockerfile.baseimage#L19 Is where I brought it in to my base Crystal/Lucky images, and setting it https://github.com/the-business-factory/hirobot.app/blob/the-hoard/Dockerfile#L40 before running apps is all that’s been necessary so far.

@robcole Hi, how does that work? Does libhoard implement the boehm gc calls, so they all work the same as the built-in ones? But my crystal compiled binaries don't depend on any gc library, I think they are part of the resulting binary? If that's so, you that means you patched the crystal itself to use Hoard? I think there's something I miss. I'd like to try it in crystalline or in another app of ours, where the Boehm GC doesn't free the memory, even after calling GC.collect, with GC_UNMAP_THRESHOLD configured.

I found that the crystal team will be swapping the boehm GC with another, custom GC, in this article from 13th of June - Latest news from the team.

@daliborfilus it’s just swapping out the allocator (system level), not the GC. I’m not familiar with the specifics of high memory usage in this app, but in many apps I’ve worked on in ruby, swapping in jemalloc for malloc led to better memory performance without any changes to anything else — I’ve found the same to be true with Hoard (another allocator).

https://engineering.linkedin.com/blog/2021/taming-memory-fragmentation-in-venice-with-jemalloc Details some of this specific to jemalloc, but the same ideas apply with Hoard.

https://dev.to/devteam/how-we-decreased-our-memory-usage-with-jemalloc-4d5n talks about a bit in a Rails app as well.

I’ve found that it has decreased memory usage, especially the rate at which it grows, in my apps, but haven’t had enough time to give it any benchmarks.