westerndigitalcorporation/desmod

Question: any sugguestions on keeping memory usage bounded for multi-factor simulation?

taoluo opened this issue · 6 comments

Hi,

My multi-factor simulation involves hundreds of individual simulations, each individual simulation lasts from a few seconds to a few hours. when running with pypy, memory usage of worker threads (subprocesses) increase overtime, it eventually goes beyond my system's memory capacity (32GB) and triggers kernel to kill some work threads.

The figure shows memory usage of worker threads (child0-7) grows over time:

image

CPython doesn't have this memory issue:

image

Interesting result and good question.

PyPy definitely has different memory consumption characteristics than CPython. This page details PyPy's GC system:

https://doc.pypy.org/en/latest/gc_info.html

Also note that there are several environment variables that can be used to change the PyPy GC behavior:

https://doc.pypy.org/en/latest/gc_info.html#environment-variables

From these charts, one thing that's unclear to me is how these 8 simulation subprocesses are exhausting 32GB of memory when the biggest consumer is ~1.5GB. I.e. it looks like these simulations use less than 12GB of memory, which on the surface doesn't seem particularly unreasonable.

Also, the memory consumption under PyPy seems to flatten-out better under PyPy than CPython. If you hadn't indicated that there was a problem with PyPy, I would not have guessed anything was wrong from just looking at the PyPy chart.

How are you measuring memory consumption in these charts? Is it the resident set size? Or something else?

Thanks for the pointer. These graph are just imcomplete snapshots. It takes several hours to run out of memory, what shows here is the increasing trend of memory consumption for PyPy, it looks like gc is not working at all.

The chart is made from memory_profiler, I guess it accounts all allocated memory of each process, also htop shows same increasing trend.

Do you have any suggestions on setting pypy's gc env variables? I do call gc.collect() in my program though.

Given the different approaches to GC and memory management between CPython and PyPy, I'm not sure that the differences between these charts is concerning on its own. I.e. do we actually expect PyPy to return memory to the operating system such that memory_profiler would report a lower "memory used" value? Also, even after looking at the docs and code for memory_profiler, it remains unclear to me what it is actually measuring.

From the PyPy chart, it looks like each subprocess' memory consumption mostly flattens over time, but that perhaps each new simulation instance causes some marginal additional memory consumption. The questions I still have are:

  • Is the incremental memory consumption we see in PyPy due to a problem with PyPy? Or is it possible that the model's code is responsible for memory retention between simulation instances? That we see the floor of the CPython processes grow over time seems like a clue that something in your program may be holding onto memory (referenced objects) across simulations.

  • If the number of CPython processes was doubled, would your system's memory similarly become exhausted? This could be tested by using simulate_factors(..., jobs=16).

  • The corollary question is whether your system's memory is still eventually exhausted if fewer PyPy subprocesses are used? I.e. simulate_factors(..., jobs=4) or jobs=6.

  • Do the numbers you see in htop for CPython drop between simulations as we see in the CPython chart? Or does the resident size shown in htop ("RES") plateau?

Regarding PyPy GC environment variables, I workaround a different apparent bug in PyPy's garbage collector by setting PYPY_GC_NURSERY=125M. This increases the new object nursery size and transitively increases the total memory consumption, with the benefit of decreasing the number of collections and improving runtime performance for my use case. I do not have a suggestion for GC configuration for your case.

All of the above said, the problem you are experiencing could seemingly be solved if desmod spawned new simulation worker subprocesses for each simulation (or for each N simulations). I consciously chose to use persistent simulation worker subprocesses specifically with PyPy in mind. I wanted PyPy's JIT to remain "hot" between simulations, under the assumption that each simulation instance would largely have the same hot code paths. For long running simulations, optimizing for keeping PyPy's JIT hot is not as big of a concern.

I came across this in PyPy's docs (https://doc.pypy.org/en/latest/jit-hooks.html#resetting-the-jit)

releaseall()

Marks all current machine code objects as ready to release. They will be released at the next GC (unless they are currently in use in the stack of one of the threads). Doing pypyjit.releaseall(); gc.collect() is a heavy hammer that forces the JIT roughly back to the state of a newly started PyPy.

I wonder if that might have a positive effect for your case?

Hi, sorry for the long silence. I didn't dig deep into this issue - but it is resolved after updating to latest PyPy (v7.3.5).

Glad the behavior has improved with the latest PyPy. And thanks for updating this issue!