GAFF - ValueError: No template found for residue 571 (58H) - MWE
razvanmarinescu opened this issue · 11 comments
I have the following code that loads a PDB structure with 2 proteins and 1 ligand (58H), and tried to build a GAFF forcefield and run it with Generalized Born implicit solvent:
pdb = PDBFile(f +'xray.pdb')
forcefield = ForceField('amber14/protein.ff14SB.xml', 'implicit/gbn2.xml')
from openff.toolkit.topology import Molecule
from rdkit import Chem
rdmol = Chem.MolFromSmiles('C#CC(O)(C#C)C1CCN(CC2=C(C(=O)OC)C(c3ccc(F)cc3Cl)N=C(c3ccccn3)N2)CC1')
molecule = Molecule.from_rdkit(rdmol, allow_undefined_stereo=True)
# Create the GAFF template generator
from openmmforcefields.generators import GAFFTemplateGenerator
gaff = GAFFTemplateGenerator(molecules=molecule)
forcefield.registerTemplateGenerator(gaff.generator)
unmatched = forcefield.getUnmatchedResidues(pdb.topology)
print('unmatched:', unmatched)
dt = 0.004*picoseconds
constraints = HBonds
hydrogenMass = 1.5*amu
system = forcefield.createSystem(pdb.topology, nonbondedMethod=CutoffNonPeriodic,
nonbondedCutoff=2*nanometer, constraints=constraints, hydrogenMass=hydrogenMass)
However, I get this error below. Notice how the 58H residue is still unmatched. How can I get it to match and work?
Did not recognize residue 58H; did you forget to call .add_molecules() to add it? <-- this is a warning
unmatched: [<Residue 570 (58H) of chain 4>]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[24], line 25
21 constraints = HBonds
22 hydrogenMass = 1.5*amu
---> 25 system = forcefield.createSystem(pdb.topology, nonbondedMethod=CutoffNonPeriodic,
26 nonbondedCutoff=2*nanometer, constraints=constraints, hydrogenMass=hydrogenMass)
File ~/bin/miniconda3/envs/westpa/lib/python3.9/site-packages/openmm/app/forcefield.py:1218, in ForceField.createSystem(self, topology, nonbondedMethod, nonbondedCutoff, constraints, rigidWater, removeCMMotion, hydrogenMass, residueTemplates, ignoreExternalBonds, switchDistance, flexibleConstraints, drudeMass, **args)
1214 rigidResidue = [False]*topology.getNumResidues()
1216 # Find the template matching each residue and assign atom types.
-> 1218 templateForResidue = self._matchAllResiduesToTemplates(data, topology, residueTemplates, ignoreExternalBonds)
1219 for res in topology.residues():
1220 if res.name == 'HOH':
1221 # Determine whether this should be a rigid water.
File ~/bin/miniconda3/envs/westpa/lib/python3.9/site-packages/openmm/app/forcefield.py:1433, in ForceField._matchAllResiduesToTemplates(self, data, topology, residueTemplates, ignoreExternalBonds, ignoreExtraParticles, recordParameters)
1431 break
1432 if matches is None:
-> 1433 raise ValueError('No template found for residue %d (%s). %s For more information, see https://github.com/openmm/openmm/wiki/Frequently-Asked-Questions#template' % (res.index+1, res.name, _findMatchErrors(self, res)))
1434 else:
1435 if recordParameters:
ValueError: No template found for residue 571 (58H). This might mean your input topology is missing some atoms or bonds, or possibly that you are using the wrong force field. For more information, see https://github.com/openmm/openmm/wiki/Frequently-Asked-Questions#template
Here is the xray.pdb file, so you can run the code. Thanks a lot!
Your PDB file is missing CONECT records for the ligand. It can't match the template, because it doesn't know which atoms are bonded to each other.
@peastman thank you very much and sorry for the late reply. I tried a different approach using this example: https://github.com/tdudgeon/simple-simulate-complex. I'm now trying to read an .sdf file from ligand-expo and get it to be simulated in complex with a protein, but it still doesn't work.
Can you help me get this working on this colab notebook? It took me a while, but I built a minimal example here (complete with OpenMM install): https://colab.research.google.com/drive/1yCK1GbIhi-o0fKaIAJ12GE5qQ-0Bb0Aa?usp=sharing
The key problem is in the last cell:
# Use Modeller to combine the protein and ligand into a complex
print('Preparing complex')
protein_pdb = PDBFile('xray.pdb')
modeller = Modeller(protein_pdb.topology, protein_pdb.positions)
from openff.toolkit.topology import Molecule
from rdkit import Chem
print('Reading ligand')
mol_in = 'E0.amber.pdb'
mol_in = '58H_model.sdf'
rdkitmol = Chem.MolFromMolFile(mol_in)
#rdkitmol = Chem.MolFromSmiles('C#CC(O)(C#C)C1CCN(CC2=C(C(=O)OC)C(c3ccc(F)cc3Cl)N=C(c3ccccn3)N2)CC1')
print(mol_in)
print('rdkitmol', rdkitmol)
#asdas
print('Adding hydrogens')
rdkitmolh = Chem.AddHs(rdkitmol, addCoords=True)
# ensure the chiral centers are all defined
Chem.AssignAtomChiralTagsFromStructure(rdkitmolh)
molecule = Molecule(rdkitmolh, allow_undefined_stereo=True)
print(molecule.to_topology())
print(molecule.to_topology().to_openmm())
print(molecule.conformers)
print(molecule.n_conformers)
# This next bit is black magic.
# Modeller needs topology and positions. Lots of trial and error found that this is what works to get these from
# an openforcefield Molecule object that was created from a RDKit molecule.
# The topology part is described in the openforcefield API but the positions part grabs the first (and only)
# conformer and passes it to Modeller. It works. Don't ask why!
#modeller.add(molecule.to_topology().to_openmm(), molecule.conformers[0])
molOpenMM = molecule.to_topology().to_openmm()
molConf = molecule.conformers[0]
modeller.add(molOpenMM, molConf)
print('System has %d atoms' % modeller.topology.getNumAtoms())
with open(output_complex, 'w') as outfile:
PDBFile.writeFile(modeller.topology, modeller.positions, outfile)
# Initialize a SystemGenerator using the GAFF for the ligand
print('Preparing system')
forcefield_kwargs = { 'constraints': app.HBonds, 'rigidWater': True, 'removeCMMotion': False, 'hydrogenMass': 4*unit.amu }
system_generator = SystemGenerator(
forcefields=['amber/ff14SB.xml'],
small_molecule_forcefield='gaff-2.11',
forcefield_kwargs=forcefield_kwargs)
system = system_generator.create_system(modeller.topology, molecules=ligand_mol)
In modeller.add(molOpenMM, molConf), I get the following error:
Preparing complex
Reading ligand
58H_model.sdf
rdkitmol <rdkit.Chem.rdchem.Mol object at 0x7f23ff7ca350>
Adding hydrogens
<openff.toolkit.topology.topology.Topology object at 0x7f232fa0ae80>
<Topology; 1 chains, 1 residues, 63 atoms, 66 bonds>
[<Quantity([[188.001 -3.969 235.821 ]
[188.66 -6.046 237.947 ]
[191.55 -11.275 239.968 ]
[186.841 -12.839 238.261 ]
[187.95 -11.969 238.017 ]
...
[191.14543931 -13.13645181 236.12910421]
[191.2125319 -13.39414063 241.72586479]
[190.65290772 -16.72176895 239.04895624]
[190.8714324 -15.20986301 237.08572083]], 'angstrom')>]
1
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
[<ipython-input-8-8f075667f7a7>](https://localhost:8080/#) in <module>
41 molOpenMM = molecule.to_topology().to_openmm()
42 molConf = molecule.conformers[0]
---> 43 modeller.add(molOpenMM, molConf)
44 # modeller.topology.setPeriodicBoxVectors(
45 # [Vec3(x=8.461, y=0.0, z=0.0),
1 frames
[/usr/local/lib/python3.9/site-packages/openmm/unit/quantity.py](https://localhost:8080/#) in append(self, item)
771 return self._value.append(item)
772 else:
--> 773 raise TypeError("Cannot append item without units into list with units")
774 def extend(self, rhs):
775 self._value.extend(rhs.value_in_unit(self.unit))
TypeError: Cannot append item without units into list with units
It's an impedance mismatch between OpenMM and OpenFF. The latter used to use the OpenMM units system, but recently they switched to a different units package. This results in errors when you try to pass an OpenFF object to an OpenMM method. I'll let one of the OpenFF developers comment on what the correct incantation is to convert it.
gotcha! That's wonderful.
I'm not sure exactly what's going on here and I have a difficult time following your notebook. I will offer some general advice that should help get you unstuck.
I recommend never using star imports. They may be convenient but it makes it ambiguous where similarly-named objects and namespaces come from, which makes it impossible to reason through errors.
As Peter mentions, OpenFF and OpenMM infrastructure use different unit solutions. You can read more here. I have made efforts to ensure they interoperate well (see the bottom of those docs) but it remains they are different solutions.
When needing to use them at the same time - I've actually spent much of today in this situation - I prefer to import my namespaces like this:
import openmm
import openmm.unit
from openff.units import unit
# could also do below to make it even more explicit ...
# from openff.units import unit as openff.unit
# ... and this would make the APIs even more similar
This forces me to make explicit which unit solution I'm using where; in general it's as simple as making sure OpenMM objects are passed openmm.unit.Quantity
objects and remembering that OpenFF objects have unit-bearing quantities represented by the corresponding unit.Quantity
.
Again I'm not sure exactly what is going wrong but these lines stand out to me:
molConf = molecule.conformers[0]
modeller.add(molOpenMM, molConf)
molConf
is an OpenFF-styled units object, which the openmm.Modeller
class may or may not be happy digesting. (At best it ignores the units, which would cause an Angstrom-nanometer mismatch, at worst it just causes a crash that looks like what you get later.)
Thanks a lot! I will switch to never using star imports. Not sure if this will fix it though, the molConf object seems to have units in 'angstrom', I printed it, see the log above. Let me check if I remove the star imports.
And sorry if the notebook seems complicated, I tried to make a minimal example, but had to install openMM and all the libraries there, and could only install openMM with conda. The main thing is in the last cell, everything else is just preparing the environment.
molConf = molecule.conformers[0]
modeller.add(molOpenMM, molConf)
That's definitely the problem. What's the correct way to convert an OpenFF quantity to the corresponding OpenMM quantity?
The recommended way is calling to_openmm()
, which exists both as a standalone function (to_openmm(x)
) or a method on a Quantity
object (x.to_openmm()
). They should do the same thing.
https://github.com/openforcefield/openff-units#openmm-interoperability
An alternative that can help in some more esoteric cases (in which the object can potentially be either solution) is ensure_quantity
, but I wouldn't reach for that first.
https://github.com/openforcefield/openff-units#dealing-with-multiple-unit-packages
The error message here is somewhat unfortunate (OpenMM reasonably doesn't recognize the Pint-styled object as bearing units) but fixing it would require making OpenMM's unit package aware of other unit solutions, which is a mess of complexity I wouldn't advocate for introducing. There might (?) be a way to add a check in places like Modeller.add
to safeguard against argument types that will later cause issues like the traceback here, but that's also probably more hassle than it's worth.
Gotcha. So it sounds like there's no easy solution here. Is there another way to import the molecule into OpenMM? Are there other examples of simulating protein-ligand complexes? The example from here is the only one I could find: https://github.com/tdudgeon/simple-simulate-complex
If any of you have some example scripts you can share, I can try to work based on those. Thanks a lot!
You just need to call to_openmm()
. I think this should work:
molConf = molecule.conformers[0].to_openmm()
Sorry for the late reply, I was travelling. I just tried it and that seems to have fixed it. Thank you very much for this!