neurolib-dev/neurolib

excessive memory use with evolution?

jshanna100 opened this issue · 2 comments

Hi,

I have been recently trying to run a 10 generation evolutionary process, with an initial population size of 64, and a population size of 32, and it runs out of memory at the 8th generation - even with 256GB allotted. I am basically at the limit of what my HPC has available, so I am now wondering if there is some way to reduce the memory usage here.

Could it be related to the evolutionary process storing the individual models? When these are stored, does this include the simulated data, e.g. the model.exc results? In my case this is over 300 nodes with 75000 samples each. If so, is storing the simulated results actually necessary for the evo algorithm, and if not, could this be turned off, somehow?

Hi @jshanna100, thanks for the issue.

You are free to throw the simulated data away for each of the individuals (which is what I'd recommend since this can get fairly huge). In this example we save the simulated output. To throw it away, you'd write your evaluation function like this:

def evaluateSimulation(traj):
    model = evolution.getModelFromTraj(traj)
    model.run()
    ... # do something to compute the fitness
    fitness_tuple += (fitness, )
    return fitness_tuple, {} # <-- the empty dictionary is the important part.

However, this would rather save disk space because the simulated results are written directly to disk.

Another problem could also be that your individual runs themselves use a lot of RAM. You can estimate it by doing something like:

RAM ~ simulation duration x 1/dt x number of nodes N x number of individuals run in parallel (ncores of Evolution)

You can estimate the contribution of the first three factors by running one simulation outside of the evolution (with model.run()) and then measure the size of the model object in memory. According to this thread, you can use

from pympler import asizeof
asizeof.asizeof(my_object)

You can change any of the factors above to save memory. If you don't need the full time series, but the last n seconds is enough, I'd use model.run(chunkwise=True), which throws anything but the last simulated chunk away. Check the chunksize parameter for how long the chunks should be.

Let me know if it helps!

It did indeed fix it. Thanks so much for the quick response!