This repository contains helper functions to read AMBER-compatible simulation parameter (e.g., prmtop
) and coordinate (e.g., inpcrd
) files, re-parameterize those files with the SMIRNOFF99Frosst force field, and then write out parameter and coordinate files that can be used to run simulations in AMBER. Examples are given in the notebooks.
Changing the parameters for an already prepared host-guest complex requires a few extra steps compared to swapping the parameters for an isolated small molecule. Rather than using a SMILES string or other chemical identifier to start, the method here attempts to preserve all position and topology information (e.g., presence or absence of dummy atoms, solvation and ionization status, box vectors, and so on). There have been a few sticky spots (detailed below) that have led to some creative -- and some clunky -- workarounds; the strategy outlined below is surely not the most direct method, but it has been the most robust in my testing.
Compared with the smirnoff_host_guest.ipynb
example notebook, here we start with AMBER files, don't charge the molecules, don't generate 3D conformers, don't perform docking, don't solvate the structure, and don't perform any minimization.
The general method is as follows, split into three sections.
- Read in existing files* and create a standards-compliant
pdb
file. - Split the
pdb
into separate topologies, extracting the host and guest molecules separately. - Use the host and guest topologies to generate
mol2
with SYBYL atom types, polymerizing a cyclic host from a monomer, if necessary. - Read the host and guest
mol2
files, ensure unique atom names, and convert to OpenEyeOEMol
s. - Use the list of
OEMol
s to create an OpenMM system with the SMIRNOFF99Frosst force field. Since we are not running any simulations here, I don't believe the value of simulation-related options (e.g.,nonbondedCutoff=1.1 * unit.nanometer
) matter, but they are required. - Convert the OpenMM system into a ParmEd structure.
- Extract the water, ions, and dummy atoms (if present) from the standards-compliant
pdb
file created after reading in the existing files. - Parameterize the water, ions, and dummy atoms (if present) with existing force fields (e.g., TIP3P water, Joung-Cheatham monovalent ions, and so on).
- Convert the water, ions, and dummy atoms (if present) to a ParmEd structure.
- Add the two ParmEd structures.
- Copy box vector information from the existing coordinates to the new coordinates.
- If necessary (because the atom indices may change between the input and output coordinates)...
- Determine the mapping between atoms in the existing and new coordinate set.
- Determine the mapipng between residues in the existing and new coordinate set.
- Rewrite any restraints that have been specified using atom index or residue number masks.
* The starting point can be multiple files or one file for the whole system. In the first example notebook, the input is one pair of files for the whole system. However, in the second example notebook, the host and guest molecules each have parameters and coordinates. To set the system coordinates, a fully solvated pdb
file is used, so this example uses five files as input.
The included notebooks run through this workflow with two examples. First, a host-guest pair from David Mobley's benchmarksets
repository (CB7-memantine). Second (more challenging), input files for an existing attach-pull-release workflow, with a multi-residue host, dummy atoms, and restraints that have been to be re-encoded with new atom ordering (αCD-1-butylamine).
A few notes on things that didn't work in my testing. Many of these things might be able to work if applied in a different context or even in a different order -- and I don't want to claim they are broken -- only that these paths led to errors one way or another, in my hands. Some of the issues may be due to my unfamiliarity with the tools, but by listing them here, someone else might avoid a few pitfalls.
- Read a
mol2
file with GAFF atom types into an OpenEyeOEMol
without usingOEIFlavor_MOL2_Forcefield
. This can be a big deal. Ignoring it can lead to oxygen being interpreted as osmium silently, leading to incorrect parameter assignment. When wildcard assignments are eliminated, this will probably be more obvious. - Go straight from a
prmtop
to an OpenEyeOEMol
via an OpenMM topology. This requires inferring bond orders (mentioned here), which has not always worked reliably. - Start from a
mol2
file containing host (or guest) together with water and ions. There can be unexpected results when the "Forcefield" flavor is required to parse the host (or guest). But more importantly, there is an explosion of file size when writing out a subsequentprmtop
file, mentioned here. - Try to run
createSystem
when all atoms are not unique. It's best to always runOETriposAtomNames
to generate unique atom names. See this issue for more information; a multi-modelmol2
seems to be troublesome, since by definition, there are non-unique atom names. - Write a
mol2
using ParmEd after usingoemmutils
to convert anOEMol
into an OpenMM topology. Themol2
file won't have positions. It seems the positions are lost after running throughoemmutils
. (I have not tried to reproduce this recently.) - Build a cyclic molecule (e.g., cyclodextrin) from SMILES and expect reasonable coordinates from OpenEye tools. I can't recall whether this is a problem building an initial structure from SMILES or even after running minimization. Other tools likely also fail to build cyclodextrin reasonably from SMILES.
- Use OpenEye to generate charges (e.g., AM1-BCC) for an entire host molecule (e.g., cyclodextrin). It will take a long time, the conformations generated are nonphysical and this may impact the quality of the charges;
antechamber
is faster and more reliable (although I don't know if it is better). - Read a
pdb
withoutCONECT
records into anOEMol
. This is going to cause problems for determining the topology, although it is possible this could be circumvented using some combination ofOEPerceiveConnectivity
and related tools. - Read a
pdb
withCONECT
records betweenH1
andH2
in water. This occurs when usingtrajout conect
withcpptraj
and TIP3P water. The improper bond will either (a) prevent creation of the OpenMM system, or (b) prevent ParmEd from saving aprmtop
. I'm not sure what determines where the error will pop up. Relatedly, this will also prevent tools likemdtraj
from recognizing the water molecules. - If trying to run
createSystem
will a single molecule, make sure to pass a list ofOEMol
s (i.e., a list of length 1). - Read a
pdb
with water residue namedHOH
instead ofWAT
. This should be handled gracefully byPDBFile
orPDBFixer
, but I've encountered problems with solvent not being recognized as such. - Use the option
rigidWater=True
tocreateSystem()
. This is related to this issue and can be avoided by doing the "mixed force field" approach of parameterizing the water through existing methods in AMBER, and then combining with the host and guest structure. Also, see this and this discussion and future developments with the newTopology
class. - Use OpenEye
oenb.draw_mol()
to draw a cyclic host. The molecule may look collapsed and distorted, but be sure to check with an external tool, because the coordinates may actually be okay! - Rarely, saving a structure with ParmEd at the final step results in
AttributeError: 'NoneType' object has no attribute 'used'
, this can be avoided by looping through all the bonds, but this is just a stop-gap measure, and requires further investigation. I think readingpdb
files withPDBFixer
is more likely to lead to this outcome. - With large molecules over 100 atoms, numeric atom types written by ParmEd will be truncated by AmberTools. One way to work around this is to rewrite atom types with random characters of length (e.g.,
101
→a9
). There can still be trouble with hardcoded recognition of inferring elements from atom types (see Niel'saimtools
repository for more information).
It is also worth noting that upcoming changes to Topology
may address some of these issues.
build/
: Contains aconda
environment file used for the notebookstests/
: Directory for test case input and output01-convert-benchmarkset.ipynb
: Example notebook #1 (see above)02-convert-APR-files.ipynb
: Example notebook #2 (see above)utils.py
: Helper functions used in the notebooks, with a few extras
Running the notebook 01-convert-benchmarkset.ipynb
will download the input files from GitHub, switch parameters, and write the files smirnoff.prmtop
and smirnoff.inpcrd
in test/cb7-1/
. Running the notebook 02-convert-APR-files.ipynb
will read files from test/a-bam-p/original/
, switch the parameters, and write the files smirnoff.prmtop
and smirnoff.inpcrd
in test/a-bam-p/generated/
.
For new systems, there should be just a few places where configuration might be required:
- When using
antechamber
to write amol2
with SYBYL atom types,acdoctor
may have to be disabled for carboxylates or other resonance structures. This is an option toutils.convert_mol2_to_sybyl_antechamber()
- When extracting the water and ions, the option
dummy_atoms=True
may be misleading. I've added explanatory text in the notebooks. - Determining whether atom or residue mapping is necessary. (This process is slow, because it runs on the fully solvated system. We can't run atom mapping earlier because the atom mapping changes after combining the two ParmEd structures.) This is shown in the second example notebook.
I used a custom conda
environment to test the workflow and fix the version of openforcefield
. The environment can be installed by running conda env create -f build/environment.yaml
.
Run the jupyter notebook
s and choose the smirnovert
environment for the kernel.
These tools rely on AmberTools (tleap
and cpptraj
) to do some intermediate conversions and it is assumed the environmental variable $AMBERHOME
is defined. This can be changed in utils.py
. (Edit: I think this can be avoided by installing the tools in a miniconda environment and using those, without having to source any file in $AMBERHOME
). I also have some boilerplate for the intermediate file conversions in utils.py
, to minimize configuration fussing, that can be changed (e.g., water model).
The scripts also heavily leverage OpenEye tools. If you see ImportError: No module named '_oechem'
, we've had luck fixing this conda upgrade libgcc
.