OpenBioSim/biosimspace

runtime error

Closed this issue · 16 comments

Hi,

I am running RBFE calculations and for some of the lambda windows I am getting the following error:

raceback (most recent call last): File "/home/administrator/anaconda3/envs/bss/share/Sire/scripts/somd-freenrg.py", line 224, in <module> OpenMMMD.runFreeNrg(params) File "/home/administrator/anaconda3/envs/bss/lib/python3.10/site-packages/sire/legacy/Tools/__init__.py", line 175, in inner retval = func() File "/home/administrator/anaconda3/envs/bss/lib/python3.10/site-packages/sire/legacy/Tools/OpenMMMD.py", line 3008, in runFreeNrg system = moves.move(system, nmoves.val, True) RuntimeError: Particle coordinate is NaN. For more information, see https://github.com/openmm/openmm/wiki/Frequently-Asked-Questions#nan

The version I am using is 2024.2.0 for both sire and biosimspace.

Can you suggest where I am going wrong?

Thank you

This is normally caused by a numerical instability. Without knowing specifics about your system it's hard to say what's gone wrong. Sometimes simply re-running the window that crashed will fix things, in other cases a particular window might repeatedly fail. Sometimes this might also be caused by poor minimisation and equilibration prior to a production run, so make sure that you have done this too. We'd really need to know more about the perturbation in question to be able to suggest anything more.

The last window is failing continuously i.e, 1.0000, Rerunning also does not fixed this situation.

Molecule A
image
Molecule B
image

When the petribution is from Molecule B to A it ran perfectly fine I obtained the good overlap matrix as well.

But in the case of Molecule A to B the last window of both bound and free is giving me the same error.

Are you able to share the inputs so that we can debug further? We do occasionally find similar issues for our own larger test set, so it is useful to find common patterns that trigger lack of reversibility.

what all input files should I send?

It's probably easiest to send the files in the directory that is failing. I'm not sure of your folder structure, but it would be something like output/lambda_1.0000. We can reconstruct everything from the inputs, so that should be enough.

please find the zip file for lambda_1.0000

lambda_1.0000.zip

Many thanks.

Hi I have a different question not related to current topic.

When I run fep calculations after each cycle the calculations falls back on cpu for some time then again it moves to gpu when the 2nd cycle starts, which I guess impacts the performance.

It is supposed to happen like this? or I am missing some thing here?

No, you're not missing anything. SOMD was originally designed as a hybrid MC/MD engine so you could mix move types in cycles. In practice, to get good performance on GPU when running FEP you want to run few long cycles so that the GPU is utilised as much as possible.

The SOMD defaults chosen by BioSimSpace are sub-optimal since it tries to match the standard behaviour of the other supported engines (for interoperability reasons). You'll probably want to greatly increase the report_interval and frame_interval of your protocol when using SOMD.

It can be a naive question but reducing the cycles and increasing the moves gives better performance, but does it compromise the accuracy?

No, there's no compromise to accuracy.

okay thank you

I can reproduce the crash locally. From the log it clearly looks like there must be some steric clashes for this system:

Initial energy: 3.17323e+09 kcal mol-1
###=======================Minimisation========================###
Running minimization.
Tolerance for minimization: 1
Maximum number of minimization iterations: 50000
Energy after the minimization: 8.78872e+10 kcal mol-1
Energy minimization done.

I'll try my own minimisation and equilibration pipeline prior to the production run to see if that fixes things.

Yes, it seems that this perturbation really requires per lambda minimisation and equilibration. SOMD doesn't really have a reliable way to equilibrate the lambda value prior to the production run. It also uses the default OpenMM minimiser, which is a bit of a black box. While this can complete without error, there's no guarantee that the final state will be well minimised (when you set a maximum number iterations). For example, if it fails with 100 steps to go it will reset, then perform a further 100 steps of minimisation, returning a structure to you that might be close to the input conformation, i.e. not well minimised.

In BioSimSpace there are specific free-energy peturbation protocols that can be used to minimise and equilibrate at specific lambda values, i.e. BioSimSpace.Protocol.FreeEnergyMinimisation, and BioSimSpace.Protocol.FreeEnergyEquilibration. However, these don't work with SOMD, so you can only use the protocols on the end states, i.e. lambda = 0 or 1. In you case, you might be able to adjust your pipeline to use AMBER or GROMACS in the setup stage.

An alternative approach is to use Sire to minimise to lamda prior to running the production simulation. For example, using the input from your lambda = 1 directory:

import sire a sr

# Load and recreate the perturbable system.
mols = sr.load("somd*7")
mol = sr.morph.create_from_pertfile(mols[-1], "somd.pert")
mols.update(mol)

# Minimise at lambda = 1.
m = s.minimisation(lambda_value=1.0)
m.run()

# Commit the changes and save to a new coordinate file.
mols = m.commit()
sr.save(mols, "test.rst7")

Now we can re-run the lambda = 1 production simulation with SOMD:

somd-freenrg -C ./somd.cfg -l 1 -c ./test.rst7 -t ./somd.prm7 -m ./somd.pert -p CUDA 

The energies look a lot more sensible and the simulation runs just fine:

Initial energy: -344371 kcal mol-1
###=======================Minimisation========================###
Running minimization.
Tolerance for minimization: 1
Maximum number of minimization iterations: 50000
Energy after the minimization: -299813 kcal mol-1
Energy minimization done.

We are currently working on an improved OpenMM based FEP engine called SOMD2. This incorporates the per lambda minimisation/equilibration discussed above, so would also be an alternative that you could use.

Hi,

I tried minimizing before the run it worked, thank you for the help.

No problem, thanks for confirming. I'll close for now since we have a workaround. Ultimately it would be nice if this could be handled automatically, but it's tricky to do this in a general way.