Update addressing memory problem
Closed this issue · 6 comments
Hi
Is this library still maintained?
It is stated in the Precautions-section that an update will be made when numpy.nanmean has axis functionality. I believe that this is the case now.
I am very interested in using your library for my research and was very excited when I found it. However, I am currently getting segmentation faults running the -f flag on a desktop, presumably because I run out of memory. Systems contains 1000's of atoms, so it could be a problem. Do you have an estimate on the runtime benchmark vs systemsize of the code?
Thank you in advance
Kasper
Hello @KasperBuskPedersen, thank you for your interest in our work!
If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
Hey @KasperBuskPedersen ,
thanks for your interest in the project. I apologize for not answering you until now. I just checked again and unfortunately they still haven't moved forward at numba with supporting the axsis option for the numpy nanmean function.
For the system with 1000 atoms I would guess that there is not enough memory. You can check before how much RAM is needed with the -m flag. This is actually always recommended if you use other flags than the -t flag.
One more remark about the runtime vs. memory usage. I had to take the np.nanmean(axis) part out of the loop to be able to jit the function successfully with numba. Therefore the dimensions are reduced by the mean function only at the very end. Therefore the entire process is very memory heavy. Numpy uses highly optimized vectorizations for array operations, which certainly come into play at this point where the array is in the ram and provide good performance, at the expense of memory usage. As mentioned, this circumstance is by necessity and not by design. However, for usual lindemann index calculations with the -t flag are still quite reasonable in terms of memory usage.
Best regards
Sebastian
Hey @KasperBuskPedersen ,
thanks again for an issue, it has motivated me once again to take on the problem. The issue at numba has been open for a very long time and therefore it was necessary to find a solution myself. The problem was that in the algoritsmus a trigagonal matrix arises. This matrix is mirrored, but has zeros in the track. This led to nan values in the division. The jit compiled numba function can not handle the required nanmean. This resulted in an array with the dimensions frames,atom,atoms being needed. Which quickly escalated as you can see in the picture:
Now I have completely removed the track with the zeros for the division and I was able to reduce the memory allocation by one dimension. This is the dimension of the frames. The memory allocation is now completely independent of the length of the trajectory. By the removal of the formation of nans in the algorithm, it is now possible that the reduction of the dimension is done in the loop. np.nanmean is thus no longer necessary and numpy functions can be used that are supported by numba so the function can be jit compiled. You can see the result here:
With version v0.5.0 this feature is implemented.
I hope this library benefits your research and that you can now use the -f flag to study the development of the lindemann index.
Best regards
Sebastian
Dear Sebastian
Thank you for looking into it! This looks excellent. I am already using the library, where I figured out a way to use the -t flag instead. I will definitely check out the -f flag now.
I am a GROMACS user, so I wrote a small script that deals with the LAMMPS format and divides a gromacs trajectory into small blocks and uses the -t flag on it. I have attached it here, as it might come in handy for other people using your library down the road.
Best,
Kasper
Dear @KasperBuskPedersen ,
I am glad that you have found a solution in the meantime. I hope the -f option will be of use to you. I have had a look at your attached python script. I would like to point out that you can use lindemann not only as a CLI tool but also as a Python module. I took the liberty of using your script as a template to show this.
from lindemann.index import per_frames
from lindemann.trajectory import read
import numpy as np
import mdtraj as md
import sys
import os
outfile = sys.argv[1]
xtcfile = sys.argv[2]
grofile = sys.argv[3]
#workaround to use lindemann with GROMACS
temp_file = "temp_{}.lammpstrj".format(outfile)
t = md.load(xtcfile,top=grofile)
t.save_lammpstrj(temp_file)
frames = read.frames(temp_file)
lineman_index_per_frame = per_frames.calculate(frames)
np.savetxt(outfile, lineman_index_per_frame)
try:
os.remove(temp_file)
except OSError as e:
print (f"Error: {e.filename} - {e.strerror}.")
I took a look at the mdtraj documentation and source code and noticed that the mdtraj method t.xyz returns a numpy array with the same shape (frames, atoms, xyz) as the read method of the lindeman module that uses ovito to read the file. So I think you could get rid of the workaround. It could look something like this:
from lindemann.index import per_frames
import numpy as np
import mdtraj as md
import sys
outfile = sys.argv[1]
xtcfile = sys.argv[2]
grofile = sys.argv[3]
t = md.load(xtcfile,top=grofile)
frames = t.xyz
lineman_index_per_frame = per_frames.calculate(frames)
np.savetxt(outfile, lineman_index_per_frame)
Perhaps you could provide me with a sample GROMACS file so I can test this. Ovito also supports a variety of different formats with their import_file method that I use. Maybe Iindemann will work with your GROMACS files out of the box, since Ovito supports them. I've only worked with LAMMPS so far, and since I've only had the chance to test lindemann with lammps trajectories, I haven't included other trajectory file formats that ovito supports.But we could certainly change that for GROMACS.
Since it looks like no further support is needed, I am closing this issue.