ldtteam/Aequivaleo

Memory leak Aequivaleo v0.1.132

Opened this issue · 6 comments

After a lot of testing, I found that it is Aequivaleo v0.1.132

sorry I don't have more information it was a bit tricky to debug

not sure if it's an incompatibility with another mod or not...

Note this shows up, especially on forge server (v47.2.31)
the server will run out of memory while idle after about an hour or so on 12gbs of memory

This is not the case when the mod is removed the server is currently still running after 12+ hours, with only 2-3bgs of memory usage while idle

The issue can be reproduced on this pack version v0.11 on both Windows and Ubuntu
https://legacy.curseforge.com/minecraft/modpacks/lessstress

For a quick check view you can use jmap -J-d64 -histo:live <pid> to get a histogram of all the objects.
Additionally you can use VisualVM to look at a Heap snapshot to determine what is taking up memory and what is holding it.

Hello,

I've just encountered this issue while testing Replication and after some investigation I believe I can share some insight as to the cause of the issue.

With issues such as these, there tend not to be any good ways to demonstrate the problem outside of a live environment or a heap dump since there aren't any useful logs generated. Unfortunately, in this case, a heap dump illustrating the issue will be on the order of gigabytes (mine is ~11GB). Therefore I will instead provide the steps necessary to demonstrate/investigate the problem locally.

  1. Download and run any "affected" modpack, such as the one linked above.
  2. In the F3 menu in a running world, observe the memory usage. Allow some time for Aequivaleo to begin recipe analysis, and observe the gradual upward trend of memory usage.
  3. Obtain a heap dump by a method of your choice. The easiest way to do this is by using Spark, running the /spark heapdump command in-game. This file will be very large, approaching the size of the total memory allocated.
  4. Analyze the received .hprof file using a heap profiling tool of choice. Consider using the Eclipse Memory Analyzer, which is both free and excellent. By running a Leak Suspects Report or by opening the Dominator Tree, in the aforementioned analyzer, it can be seen that the Aequivaleo analysis runner thread is using a (subjectively) disproportionate amount of memory.

I obtained the following screenshot from the EMA dominator tree:
aequivaleo

As can be seen, the cycle reduction mechanism is holding reachable objects using 43.3% of allocated heap; these objects cannot be released to collection as they are still being used by the cycle detector. The process of reducing cycles in the recipe graph is rather memory-intensive, with memory utilization scaling significantly with the number of recipes (specifically, with recipe cycles) present in a modpack. Another important piece of evidence is the fact that an otherwise-identical modpack with no recipes (removed by datapack or KubeJS) will not experience this issue.

Therefore, my conclusion is that the issue is the result of the combinatorial explosion of recipe cycles that naturally occur in modpacks of sufficient size, or in any modpack with significant loopback in the included recipes. I'm willing to conjecture that this issue doesn't seem to occur in Aequivaleo-only installations (or in small modpacks) since the cycle reduction mechanism simply does its work and then is disposed during collection; whereas with larger packs there is exponentially more resources required in unwinding the recipe graph, so the mechanism will simply keep using more and more memory - giving the appearance of a memory leak.

I regard this as a rather serious issue as the "conventional wisdom" of doing a binary search (removing half of mods, reload, repeat) to find the culprit will not work in this situation as the problem arises from the interaction between many mods in a modpack and is not really the fault of any single one. Modpack makers, operating in good faith and using this strategy will inevitably misidentify the "culprit", which could cause reputational harm to the misidentified mod.

In conclusion - if there is not a feasible way to reduce the overhead of this process, and it is not feasible to implement a "static" mode (such that modpack makers can precompute the recipe graph analysis data and save it somehow to be distributed with their modpack), then I must recommend posting a disclaimer as to the potentially very high resource usage of this mod.

Hello, yes I am aware of this.
We have an internal test version that runs on the ATM9 No Frills pack without this leak (which is a really good benchmark for this kind of test).
But making the fixed surfaced an issue in the analysis engine, so that needs fixing first.

That rebuild of the engine passed internal test hurdles today, so lets hope I can get it to work in production test cases.

Thank you for your time, work, and attention!

a couple of issues, one major, several minor surfaced yesterday.
Most noteably the new cycle reducer (which is what replaced the "memory leak" (it is not one btw, it just needs a metric ton of memory to find all cycles)) has an issue where it is not finding one particular kind of cycle

Once that is fixed I think I am pretty close to a release

Just ended up here after a day and a half of troubleshooting a private pack of mine. Managed to narrow it down to this mod through a ridiculously tedious process of trial and error. When my main pack broke, I created a copy of it, deleted all the mods and then installed 10 at a time and tested, repeatedly, until it broke. For me, it broke on "Joining World..." and would hang there forever. I got it to load BARELY with 48GB of RAM allocated at one point just to see what would happen if I did that LOL. But with any sane amount of memory it would hang on joining world. When the test pack finally did break, I turned off the last 10 mods I added and it still remained broken, so I tried disabling the core mods for those 10 mods, which fixed it. After that, I re-enabled one of the three mods along with it's core mod to see which one was broken, and I found that Replication, which requires Aequivaleo, caused it to break. But it was not Replication, it was Aequivaleo. Long story short, it took a LOT of testing everything I could think of, but it brought me to this mod, and thus, this github page. Fortunately I am not the only one with this problem, as this thread details exactly the issue I discovered, and explains WHY the mod is causing the issue. I'm going to stick around here for an update to test, because I was really looking forward to testing Replication (which yknow, needs this mod lol)