OpenBioSim/biosimspace

[BUG] Potential memory leak when saving systems

noahharrison64 opened this issue · 5 comments

Hey,

I've been having annoying memory issues with my module that utilises BSS to handle simulation files. After a fair bit of debugging I think I've managed to pinpoint the issue on BSS.IO.saveMolecules(), I'm also aware this might be a Sire issue but thought I'd post here since I don't directly interact with the Sire objects. This issue only became apparent because I was looping over a long list of large top/gro files, reading, manipulating and writing lots of systems in relatively quick succession. This lead to the kernel dying due to lack of available memory.

I set up a test system to reproduce. I'm working with large, solvated systems so the memory increase is more pronounced. I tried using garbage collection to force deallocation of memory, but to no avail.

import BioSimSpace as BSS
import psutil
from pathlib import Path
import gc
    
hybrid_dir = Path('top_coord_files')
write_dir = Path('output_dir')

def read_and_write(i: int, hybrid_dir: Path, write_dir: Path):
    stem = f'lig-ejm-42_lig-ejm-43_hybrid_{i}'
    top_file = hybrid_dir / f'{stem}.top'
    coord_file = hybrid_dir / f'{stem}.gro'
    mol = BSS.IO.readMolecules([str(top_file), str(coord_file)])
    BSS.IO.saveMolecules(str(write_dir / stem), mol, ['grotop', 'gro87'])

for i in range(47):
    print(f'Starting iteration {i}')
    print("Memory pre read_and_write: ", psutil.virtual_memory().percent)
    read_and_write(0, hybrid_dir, write_dir)
    print("Memory post read_and_write: ", psutil.virtual_memory().percent)
    gc.collect()
    print("Memory post garbage collection: ", psutil.virtual_memory().percent)
    print('\n')

Starting iteration 0
Memory pre read_and_write: 21.8
Memory post read_and_write: 23.5
Memory post garbage collection: 23.5

Starting iteration 1
Memory pre read_and_write: 23.5
Memory post read_and_write: 24.4
Memory post garbage collection: 24.4

Starting iteration 2
Memory pre read_and_write: 24.4
Memory post read_and_write: 25.3
Memory post garbage collection: 25.3

Starting iteration 3
Memory pre read_and_write: 25.3
Memory post read_and_write: 26.1
Memory post garbage collection: 26.1

Starting iteration 4
Memory pre read_and_write: 26.1
Memory post read_and_write: 26.9
Memory post garbage collection: 26.9

Starting iteration 5
Memory pre read_and_write: 26.9
Memory post read_and_write: 27.8
Memory post garbage collection: 27.8

As you can see the memory utilisation increases by over 5% in the space of 5 iterations. I'm working with lots of systems and so this can become unsustainable very quickly.

Interestingly, removing the 'saveMolecules' call leads to a much more modest increase in memory utilisation.

Starting iteration 0
Memory pre read_and_write: 21.8
Memory post read_and_write: 23.3
Memory post garbage collection: 23.3

Starting iteration 1
Memory pre read_and_write: 23.3
Memory post read_and_write: 23.4
Memory post garbage collection: 23.4

Starting iteration 2
Memory pre read_and_write: 23.4
Memory post read_and_write: 23.5
Memory post garbage collection: 23.5

Starting iteration 3
Memory pre read_and_write: 23.5
Memory post read_and_write: 23.5
Memory post garbage collection: 23.5

Unclear to me whether there's some cache-ing going on BTS that I'm not aware of. Or perhaps the C++ / Python interface is causing this issue.

Input files
Presumably this isn't an issue with my systems in particular, but I've attached them for testing anyway.

(please complete the following information):

  • OS: Ubuntu 20.04.6 LTS
  • Version of Python: 3.11.6
  • Version of BioSimSpace: 2023.5.0
  • Version of Sire: 2023.5.1
  • I confirm that I have checked this bug still exists in the latest released version of BioSimSpace: Haven't checked 5.1 but assumed since Sire is latest version it shouldn't make a difference. Happy to test if suggested though.

BioSimSpace has a file cache for IO operations built in, which avoids writing the same system to file multiple times when the system hasn't changed. I imagine that this is growing large. If it's problematic for your use case I could add an option to disable the cache. As a workaround, you could just periodically clear it via:

BSS.IO._file_cache._cache = BSS.IO._file_cache._FixedSizeOrderedDict()

Ah okay thought it might be something like this, thanks for letting me know!

All the topology files are the same, its just the coordinates which are changing, so I'm sure the cache is just building up. Looks like resetting it fixes the issue. I'm happy to just use the workaround unless / until you want to incorporate a disable cache option. Perhaps a good intermediary option would be to mention the cache on the docs somewhere (Unless it's already on there and I couldn't spot it!)

Thanks for the help as always,
Noah

No problem. It will be very easy to add a disableCache or clearCache function. Will add to my TODO list. I've only not done it since, to date, there haven't been any issues. Clearly your use case is a good example for where it doesn't work so well. (The main use case was for setting up many simulations using the same input, e.g. windows for FEP simulation.

Adding a note to the docs is also a good idea. I always document the API then forget that the autogenerated docs only expose the public API, i.e. internally functionality like this isn't on the website, unless mentioned explicitly.

Sounds good! Cheers :)