openforcefield/openff-forcefields

Two molecules that can be parameterized are failing in simulations

j-wags opened this issue · 5 comments

Two molecules that we can make OpenMM systems for are failing when run in simulations.

I need to come back and flesh this Issue out more, eg with reproducing code

Molecule 2

[H]c1c(c(c(c(c1[H])S(=O)(=O)N=N#N)S(=O)(=O)N=N#N)[H])[H]

This one parameterizes, but it looks like some nitrogens end up with a partial charge of 1.426, which causes the simulation to crash, or the forces to be super steep. I'm betting it's due strong electrostatic interactions between the substituent groups.

Screen Shot 2020-10-27 at 1 03 37 PM

Molecule 4

[H]c1c(c(c(c(c1[H])[H])C([H])([H])P(c2c(c(c(c(c2[H])[H])[H])[H])[H])(c3c(c(c(c(c3[H])[H])[H])[H])[H])(C([H])([H])c4c(c(c(c(c4[H])F)[H])[H])[H])Br)[H])[H]

This one parameterizes (somehow!) and some atom is winding up with a charge of +1.9! That could explain the crash.

Screen Shot 2020-10-27 at 1 05 08 PM

Originally posted by @j-wags in #27 (comment)

@davidlmobley points out that "molecule 2" has nitrogens with five bonds, which is unusual. This does seem to be possible, and is seen in other structures like nitrate, so I'm not sure if that's a hard-and-fast rule.

Come to think of it, I've seen this N_3 chemistry in lots of molecules -- It's an azide moiety. Usually it's represented with formal charges to make the bond count more reasonable (RN=[N+1]=[N-1]).

Hm, I'm having trouble reproducing these. I wonder whether these errors have something to do with the CI environment and not the molecules themselves?

from simtk import unit
from simtk import openmm
from simtk.openmm import app
from openmmforcefields.generators import SystemGenerator
from openforcefield.topology import Molecule
from openforcefield.utils.toolkits import RDKitToolkitWrapper, OpenEyeToolkitWrapper
import numpy as np

def hmr_driver(mol, ff_name):
    """Given an OpenFF Molecule, run a short 4 fs HMR simulation. This function is adapted from
    https://github.com/openforcefield/openforcefields/issues/19#issuecomment-689816995"""
    print(
        f"Running HMR with force field {ff_name} and molecule with SMILES {mol.to_smiles()}"
    )

    forcefield_kwargs = {
        "constraints": app.HBonds,
        "rigidWater": True,
        "removeCMMotion": False,
        "hydrogenMass": 4
        * unit.amu,  # Does this also _subtract_ mass from heavy atoms?:w
    }

    system_generator = SystemGenerator(
        small_molecule_forcefield=ff_name,
        forcefield_kwargs=forcefield_kwargs,
        molecules=mol,
    )
    system = system_generator.create_system(mol.to_topology().to_openmm())

    temperature = 300 * unit.kelvin
    collision_rate = 1.0 / unit.picoseconds
    timestep = 4.0 * unit.femtoseconds

    integrator = openmm.LangevinIntegrator(temperature, collision_rate, timestep)
    context = openmm.Context(system, integrator)
    mol.generate_conformers(n_conformers=1)
    context.setPositions(mol.conformers[0])

    # Run for 10 ps
    integrator.step(2500)

    state = context.getState(getEnergy=True)
    pot = state.getPotentialEnergy()
    # OpenMM will silenty "fail" if energies aren't explicitly checked
    if np.isnan(pot / pot.unit):
        raise NANEnergyError()

#mol = Molecule.from_smiles('[H]c1c(c(c(c(c1[H])[H])C([H])([H])P(c2c(c(c(c(c2[H])[H])[H])[H])[H])(c3c(c(c(c(c3[H])[H])[H])[H])[H])(C([H])([H])c4c(c(c(c(c4[H])F)[H])[H])[H])Br)[H])[H]')
mol = Molecule.from_smiles('[H]c1c(c(c(c(c1[H])S(=O)(=O)N=N#N)S(=O)(=O)N=N#N)[H])[H]')
for i in range(100):
    hmr_driver(mol, 'openff-1.2.0')

Nothing in this repo accesses the org's OpenEye license, which that would explain why you have to force RDKit usage to generate conformers. (The tests can be modified to use each backend, if that's worthwhile). That being said, I've gotten through a few loops of that iteration locally and none have failed yet.

This is a (well, another) point in favor of #28, as we want to capture exactly where in the pipeline one of these tests fail, not capture everything as "probably HMR"

Agreed, and I'm strongly in favor of implementing #28.

Since there could be a bunch of causes for this, I'm going to close the issue until we have a locally-running minimal reproducing example.