Umbrella Sampling with OpenMM: Excessive Runtime?

Hello!
Thanks so much for the answers in issue #99, gonna go ahead and close that issue for the time being (I ended up using daiquiri with the OpenMM example, happy to share if that'd be helpful to you all). I also took @InnocentBug's suggestion to try umbrella sampling with the ADP example in OpenMM. I wrote the code shown below. My understanding is that this should run five umbrellas in OpenMM over 1e5 time steps (after an initial burn-in), then use WHAM to stitch these together to provide the A matrix. Looking at some other examples, the constants I set below in terms of the torsional angles and the umbrella k constant all seem reasonable. If I run just a single umbrella in OpenMM without using pysages (by adding, for eg, bias_torsion_phi = CustomTorsionForce("0.5*k_phi*dtheta^2; dtheta = min(tmp, 2*pi-tmp); tmp = abs(theta - phi)") ), and all the exact same code as below but without pysages, then the simulation completes in, like, 4 seconds.

I started the script below running on a single core yesterday morning, and it finished this morning. I'm confused as to what is taking such a high computational overhead. I have tried changing the k values, start/end locations of (psi,phi), num_umbrellas, etc., then I run for an hour before killing it. It really shouldn't take an hour, right? It should take, what, ~30 sec?

On the implementation side of things (and I'd be happy to help with this), I think it would be really nice to have one or some of the following:

TQDM on the for loop for the set of umbrellas to give an estimate of how long it will run for (see

PySAGES/pysages/methods/umbrella_integration.py

Line 97 in 192e7f9

for rep in range(Nreplica):

)
A cutoff flag for time running if not converged, or...
Based on the motion of the system, some kind of feedback to say "you should decrease your time step" or "you should increase/decrease K"
A warning when running WHAM along the lines of "umbrellas don't overlap much at all, so WHAM (or MBAR?) isn't going to give good results on the FES"

Thanks so much! 😄

from pysages.collective_variables import DihedralAngle
from pysages.methods import ABF, UmbrellaIntegration
import numpy as np
from pysages.utils import try_import
from openmm import *
from openmm.app import *

import numpy
import pysages

openmm = try_import("openmm", "simtk.openmm")
unit = try_import("openmm.unit", "simtk.unit")
app = try_import("openmm.app", "simtk.openmm.app")

pi = numpy.pi

def generate_simulation(**kwargs):
    pdb_filename = "alanine-dipeptide-explicit.pdb"
    T = 298.15 * unit.kelvin
    dt = 2.0 * unit.femtoseconds
    pdb = app.PDBFile(pdb_filename)

    ff = app.ForceField("amber99sb.xml", "tip3p.xml")
    cutoff_distance = 1.0 * unit.nanometer
    topology = pdb.topology
    system = ff.createSystem(
        topology, constraints = app.HBonds, nonbondedMethod = app.NoCutoff,
        nonbondedCutoff = cutoff_distance
    )
    
    positions = pdb.getPositions(asNumpy = True)

    integrator = openmm.LangevinIntegrator(T, 1 / unit.picosecond, dt)

    platform = Platform.getPlatformByName('CPU')
    simulation = app.Simulation(topology, system, integrator, platform)
    simulation.context.setPositions(positions)
    simulation.minimizeEnergy()

    return simulation

cvs = (
    DihedralAngle((4, 6, 8, 14)),
    DihedralAngle((6, 8, 14, 16))
)

num_umbrellas = 5
start_phi = -pi/2
end_phi = -pi/4
start_psi = -pi/2
end_psi = -pi/4

centers = np.array(list(zip(np.linspace(start_phi,end_phi,num_umbrellas).tolist(),\
                   np.linspace(start_psi,end_psi,num_umbrellas).tolist())))

method = UmbrellaIntegration(cvs)
result = method.run(generate_simulation,timesteps=1e5,centers=centers,\
                    ksprings=100,hist_periods=50)

Thanks for submitting this example.

This is something, that is probably best handled by the MD engines themselves.
Maybe we integrate something at some point for these iterations, but PySAGES gives the control for longer simulations to the MD engines.
So longer runs have to be terminated by the MD engine.
HOOMD-blue offers the HOOMD_WALLTIME flag, which raises an exception of the runtime is exceeded.
I am not sure if OpenMM offers something similar.

The long term plan is more to have the iteration flag you mentioned parallized over multiple GPUs.

Also estimates of how long a given simulation will run are handle by the engines.
HOOMD-blue prints this out by default, OpenMM has state reporter that can report the estimated run time.

At some point, we also want to implement a functionality, that long runs can be checkpointed and restarted.
I would say we postpone these great ideas until then.

I just noticed, that you explictly request the CPU platform.
Does that have an impact on your runtime experience?

I took this issue as inspiration for a google colab that runs a Harmonic Bias simulation with OpenMM.
It will soon be part of the PySAGES tutorials.
Not that if you run it in CPU mode the actual simulation cell takes about 12 minutes, but using the GPU as accelerator the same cell takes less then 30 seconds.
OpenMM seems to be very sensitive to using the GPU. (Other the HOOMD, which is more tolerant with CPU for small examples.