Assign Partial Charges for many conformers of Small Molecule
maciejwisniewski-drugdiscovery opened this issue · 4 comments
Describe the bug
I have a molecule in RDKit for which I am generating a number of conformations. Then, for each conformer, I want to generate partial charges using AM1-BCC. However, I am getting the following error, why?
To Reproduce
import rdkit
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import rdDistGeom, rdFMCS
from openff.toolkit import Molecule
import numpy as np
from scipy.spatial.distance import cdist
import time
from openmm import unit
from openmm.unit import Quantity
def get_distance_matrix(coords):
return cdist(coords, coords)
def get_conformations(molecule):
all_partial_charges = []
molecule.RemoveAllConformers()
molecule = Chem.AddHs(molecule)
params = rdDistGeom.ETKDGv3()
params.randomSeed = 0xd06f00d
params.numThreads = 4
params.maxAttempts = 100
# print how many atoms there is
cids = rdDistGeom.EmbedMultipleConfs(molecule, 10, params)
if len(cids) == 0:
print("Failed to generate conformations for molecule. Skipping.")
return None, None
molecule = Chem.RemoveHs(molecule)
# Calculating Partial Charges with AM1-BCC
openff_mol = Molecule.from_rdkit(molecule,
allow_undefined_stereo=True,
hydrogens_are_explicit=True)
for conformer in openff_mol.conformers:
openff_mol.assign_partial_charges(
partial_charge_method = 'am1-mulliken',
use_conformers = conformer
)
all_partial_charges.append(openff_molecule.partial_charges)
return all_partial_charges
rdkit_mol = Chem.MolFromSmiles('COC(=O)C(C1CCCCN1)C2=CC=CC=C2')
pc_list = get_conformations(rdkit_mol)
Output
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[203], line 1
----> 1 means, poses, mol = get_conformations(mols)
Cell In[199], line 33, in get_conformations(molecules)
29 openff_mol = Molecule.from_rdkit(molecule,
30 allow_undefined_stereo=False,
31 hydrogens_are_explicit=True)
32 start = time.time()
---> 33 openff_mol.assign_partial_charges(
34 partial_charge_method = 'am1-mulliken',
35 use_conformers = np.array(positions_lst[0])
36 )
37 print('\n')
38 print(openff_molecule.partial_charges)
File /mnt/evafs/groups/sfglab/mwisniewski/anaconda3/envs/openmm/lib/python3.9/site-packages/openff/toolkit/topology/molecule.py:2677, in FrozenMolecule.assign_partial_charges(self, partial_charge_method, strict_n_conformers, use_conformers, toolkit_registry, normalize_partial_charges)
2663 warnings.warn(
2664 f"Warning! Partial charge method '{partial_charge_method}' is not designed "
2665 "for use on large (i.e. > 150 atoms) molecules and may crash or take hours to "
(...)
2669 stacklevel=2,
2670 )
2672 if isinstance(toolkit_registry, ToolkitRegistry):
2673 # We may need to try several toolkitwrappers to find one
2674 # that supports the desired partial charge method, so we
2675 # tell the ToolkitRegistry to continue trying ToolkitWrappers
2676 # if one raises an error (raise_exception_types=[])
-> 2677 toolkit_registry.call(
2678 "assign_partial_charges",
2679 molecule=self,
2680 partial_charge_method=partial_charge_method,
2681 use_conformers=use_conformers,
2682 strict_n_conformers=strict_n_conformers,
2683 normalize_partial_charges=normalize_partial_charges,
2684 raise_exception_types=[],
2685 _cls=self.__class__,
2686 )
2687 elif isinstance(toolkit_registry, ToolkitWrapper):
2688 toolkit_wrapper: ToolkitWrapper = toolkit_registry
File /mnt/evafs/groups/sfglab/mwisniewski/anaconda3/envs/openmm/lib/python3.9/site-packages/openff/toolkit/utils/toolkit_registry.py:280, in ToolkitRegistry.call(self, method_name, raise_exception_types, *args, **kwargs)
278 for toolkit, error in errors:
279 msg += f" {toolkit} {type(error)} : {error}\n"
--> 280 raise ValueError(msg)
ValueError: No registered toolkits can provide the capability "assign_partial_charges" for args "()" and kwargs "{'molecule': Molecule with name '5svy_ligand' and SMILES '[C][N+][C][C][C][C][C]([N][C](=[O])[C]([N][C](=[O])[C]([C][C][C][N][C]([N])=[N+])[N][C](=[O])[C]([C])[N+])[C]([C])[O])[C](=[O])[N][C]([C][C][C]([N])=[O])[C](=[O])[N][C]([C]([C])[O])[C](=[O])[N][C]([C])[C](=[O])[N][C]([C][C][C][N][C]([N])=[N+])[C](=[O])[N][C]([C][C][C][C][N+])[C](=[O])[N][C]([C][O])[C](=[O])[N][C]([C]([C])[O])[C](=[O])[O-]', 'partial_charge_method': 'am1-mulliken', 'use_conformers': array([[-1.11773447e+01, 4.10357643e+00, -8.88736732e-01],
[-1.15084816e+01, 4.46373642e+00, 4.71618630e-01],
[-1.20780069e+01, 3.28730461e+00, 1.16325346e+00],
[-1.24256324e+01, 3.29858823e+00, 2.37815716e+00],
[-1.25460764e+01, 5.58289209e+00, 4.12319050e-01],
[-1.22564292e+01, 2.08047344e+00, 4.63477783e-01],
[-1.28090037e+01, 9.09953823e-01, 1.11628013e+00],
[-1.18880426e+01, -2.41324935e-01, 1.04160749e+00],
[-1.21800378e+01, -1.36631590e+00, 1.56728252e+00],
[-1.42012016e+01, 6.24691101e-01, 5.49249615e-01],
[-1.41968271e+01, 3.64862105e-01, -8.93642993e-01],
[-1.55037586e+01, 7.09030286e-02, -1.51316246e+00],
[-1.62568910e+01, -1.04661472e+00, -1.04941327e+00],
[-1.70924590e+01, -1.11634194e+00, 7.20119260e-02],
[-1.72973717e+01, -1.12347913e-01, 8.48313913e-01],
[-1.77408866e+01, -2.38856753e+00, 3.62128888e-01],
[-1.06390398e+01, -1.76276860e-01, 3.92824434e-01],
[-9.63682826e+00, -1.20762718e+00, 2.64602267e-01],
[-8.29119997e+00, -6.36734042e-01, 5.31035224e-01],
[-8.17832371e+00, 6.07602908e-01, 7.86365116e-01],
[-9.71925053e+00, -1.72625403e+00, -1.18492025e+00],
[-8.77494581e+00, -2.69698226e+00, -1.35434830e+00],
[-9.70523449e+00, -6.13500336e-01, -2.17448041e+00],
[-7.08912227e+00, -1.41097292e+00, 5.14972631e-01],
[-5.82625352e+00, -7.52660805e-01, 7.78679208e-01],
[-4.98020224e+00, -1.30532655e+00, 1.81640822e+00],
[-4.40863003e+00, -2.63306866e+00, 1.82189082e+00],
[-5.21875295e+00, -3.85650983e+00, 1.77487014e+00],
[-6.21330933e+00, -3.93043845e+00, 2.93598768e+00],
[-7.02593630e+00, -5.12179886e+00, 2.88835962e+00],
[-8.32223699e+00, -4.89359835e+00, 2.29526826e+00],
[-5.14252470e+00, -4.29123494e-01, -5.31852877e-01],
[-5.67639451e+00, -9.07508977e-01, -1.54622654e+00],
[-4.00601983e+00, 3.47613629e-01, -6.69432550e-01],
[-3.33438938e+00, 7.03539637e-01, -1.87763598e+00],
[-1.90415534e+00, 7.05306724e-01, -1.81010920e+00],
[-1.20496488e+00, 1.00692385e+00, -2.84761729e+00],
[-3.95525483e+00, 1.84605421e+00, -2.61752324e+00],
[-4.11188905e+00, 3.13382571e+00, -1.91635838e+00],
[-2.96189799e+00, 3.83343257e+00, -1.39198350e+00],
[-1.74847749e+00, 3.59660725e+00, -1.53556549e+00],
[-3.19710480e+00, 5.00127518e+00, -5.64359898e-01],
[-1.11237893e+00, 3.86611015e-01, -6.44963662e-01],
[ 3.13590510e-01, 4.60181915e-01, -7.28468991e-01],
[ 9.04481031e-01, -9.34189483e-01, -8.79896922e-01],
[ 2.53210134e-01, -1.97015583e+00, -7.90603359e-01],
[ 9.08455760e-01, 1.05757224e+00, 5.31138904e-01],
[ 5.58346998e-01, 2.41729913e-01, 1.62298149e+00],
[ 3.74852860e-01, 2.43250414e+00, 8.01003796e-01],
[ 2.33248224e+00, -9.75432838e-01, -1.14105657e+00],
[ 2.96184467e+00, -2.28429296e+00, -1.25638514e+00],
[ 4.30447602e+00, -2.31733989e+00, -6.07700395e-01],
[ 4.67287827e+00, -3.49866696e+00, -2.98665296e-01],
[ 3.15267019e+00, -2.47103306e+00, -2.78486427e+00],
[ 5.01888839e+00, -1.16461878e+00, -3.97645577e-01],
[ 6.34748568e+00, -1.09629165e+00, 2.45199056e-01],
[ 7.06642764e+00, -4.89782543e-02, -4.64707282e-01],
[ 6.55821951e+00, 4.58120864e-01, -1.55776858e+00],
[ 6.92223029e+00, -2.49489225e+00, 7.47057723e-02],
[ 8.32272988e+00, -2.67995586e+00, 6.53897835e-01],
[ 8.62781435e+00, -4.16991322e+00, 3.90919636e-01],
[ 9.87651598e+00, -4.59532729e+00, 9.12415952e-01],
[ 1.11410322e+01, -4.20676425e+00, 4.40361295e-01],
[ 1.13259545e+01, -3.41118615e+00, -5.43717748e-01],
[ 1.23095175e+01, -4.71498760e+00, 1.09520933e+00],
[ 8.30732966e+00, 5.35921892e-01, -1.29298934e-01],
[ 8.87387943e+00, 1.58295663e+00, -1.02382940e+00],
[ 1.02846981e+01, 1.33848630e+00, -1.18521661e+00],
[ 1.08121482e+01, 1.74306853e+00, -2.29050517e+00],
[ 8.46162638e+00, 2.90676139e+00, -4.35621148e-01],
[ 8.82426416e+00, 4.18101514e+00, -1.02578122e+00],
[ 1.02225220e+01, 4.63628161e+00, -1.13706648e+00],
[ 1.09134775e+01, 4.73890348e+00, 2.19570024e-01],
[ 1.22886539e+01, 5.20566392e+00, -4.04851918e-03],
[ 1.11253619e+01, 7.06212499e-01, -2.26797446e-01],
[ 1.25375624e+01, 5.85894776e-01, -6.03703594e-01],
[ 1.32071753e+01, -2.84800997e-01, 3.52395021e-01],
[ 1.24822881e+01, -1.00940228e+00, 1.11630491e+00],
[ 1.31574856e+01, 1.99430379e+00, -5.61725478e-01],
[ 1.44627105e+01, 1.90194990e+00, -1.00332130e+00],
[ 1.45962291e+01, -4.23987806e-01, 5.57900955e-01],
[ 1.52821361e+01, -1.26790904e+00, 1.50342919e+00],
[ 1.49344836e+01, -7.80655219e-01, 2.90579939e+00],
[ 1.58639626e+01, -2.18864465e-02, 3.57500735e+00],
[ 1.67703915e+01, -1.10100738e+00, 1.28912164e+00],
[ 1.70386253e+01, 2.76113592e-01, 1.46199899e+00],
[ 1.72521939e+01, -1.52231923e+00, -6.08733303e-02],
[ 1.38206211e+01, -1.10658659e+00, 3.34794179e+00]]), 'strict_n_conformers': False, 'normalize_partial_charges': True, '_cls': <class 'openff.toolkit.topology.molecule.Molecule'>}"
Available toolkits are: [ToolkitWrapper around The RDKit version 2023.03.3, ToolkitWrapper around AmberTools version 22.5, ToolkitWrapper around Built-in Toolkit version None]
ToolkitWrapper around The RDKit version 2023.03.3 <class 'openff.toolkit.utils.exceptions.ChargeMethodUnavailableError'> : partial_charge_method 'am1-mulliken' is not available from RDKitToolkitWrapper. Available charge methods are {'mmff94': {}, 'gasteiger': {}}
ToolkitWrapper around AmberTools version 22.5 <class 'openff.toolkit.utils.exceptions.InvalidConformerError'> : molecule.add_conformer given input of the wrong shape: Given (3,), expected (88, 3)
ToolkitWrapper around Built-in Toolkit version None <class 'openff.toolkit.utils.exceptions.ChargeMethodUnavailableError'> : Partial charge method "am1-mulliken"" is not supported by the Built-in toolkit. Available charge methods are {'zeros': {'rec_confs': 0, 'min_confs': 0, 'max_confs': 0}, 'formal_charge': {'rec_confs': 0, 'min_confs': 0, 'max_confs': 0}}
When you call out to a partial charge assignment method here:
openff_mol.assign_partial_charges(
partial_charge_method = 'am1-mulliken',
use_conformers = conformer
)
the use_conformers
argument needs to be like a list[Quantity]
but, since you unwrapped Molecule.conformers
, it's just a single quantity. Using a one-length list should wire it through correctly.
I stripped this example down to show what this could look like, just using the toolkit instead of interacting with RDKit directly:
import numpy
from openff.toolkit import Molecule
molecule = Molecule.from_smiles(
"COC(=O)C(C1CCCCN1)C2=CC=CC=C2",
allow_undefined_stereo=True,
)
molecule.generate_conformers(n_conformers=10)
all_partial_charges = list()
for conformer in molecule.conformers:
molecule.assign_partial_charges(
partial_charge_method="am1bcc",
use_conformers=[conformer],
)
all_partial_charges.append(molecule.partial_charges)
# default conformer charge generation settings may not generate the same
# number of conformers as requested (10) so just compare to what's there
assert len(all_partial_charges) == molecule.n_conformers
# sanity check, standard deviation of partial charges across conformers
# should be low but non-zero
print(numpy.asarray(all_partial_charges).std(axis=0))
"""
[0.00165719 0.0152737 0.00463931 0.01361707 0.00414403 0.00997942
0.01345663 0.00157 0.01386034 0.01118281 0.01591509 0.0107015
0.00670867 0.0040332 0.0019573 0.0040332 0.00670867 0.00119447
0.00119447 0.00119447 0.00651872 0.01836844 0.00434273 0.00434273
0.00100577 0.00100577 0.00194082 0.00194082 0.01052322 0.01052322
0.00717461 0.00591086 0.00129767 0.00175138 0.00129767 0.00591086]
"""
A couple of other things I'd like to point out:
- Your comment is about using AM1-BCC, but you call out to
"am1-mulliken"
, which is AM1 without the BCCs. This may not be what you expect - Note that in its design an OpenFF
Molecule
object can store many conformers but only one set of partial charges. I think you've already picked up on this, since you're storing the partial charges of each conformer in a separate list.
@j-wags this bubbles up here, where it's never quite explicitly checked that use_conformers
isn't a Quantity
itself, so this iterator can incidentally iterate through each row in a conformer array when it thinks it's iterating through conformers:
openff-toolkit/openff/toolkit/utils/ambertools_wrapper.py
Lines 186 to 187 in f79f48b
Sorry that I missed this initially @maciejwisniewski-drugdiscovery - I've updated my notification settings so that it won't happen again.
@mattwthompson's response and proposed solution are 100% correct. Please let us know if his proposed solution fixes your use case.
Thanks for your response! I appreciate your help. I have another problem, but I'll open a separate thread for that.