openforcefield/openff-toolkit

Assign Partial Charges for many conformers of Small Molecule

maciejwisniewski-drugdiscovery opened this issue · 4 comments

Describe the bug

I have a molecule in RDKit for which I am generating a number of conformations. Then, for each conformer, I want to generate partial charges using AM1-BCC. However, I am getting the following error, why?

To Reproduce

import rdkit
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import rdDistGeom, rdFMCS
from openff.toolkit import Molecule
import numpy as np
from scipy.spatial.distance import cdist
import time
from openmm import unit
from openmm.unit import Quantity
def get_distance_matrix(coords):
    return cdist(coords, coords)
def get_conformations(molecule):

    all_partial_charges = []

    molecule.RemoveAllConformers()
    molecule = Chem.AddHs(molecule)
    params = rdDistGeom.ETKDGv3()
    params.randomSeed = 0xd06f00d
    params.numThreads = 4
    params.maxAttempts = 100
    # print how many atoms there is
    cids = rdDistGeom.EmbedMultipleConfs(molecule, 10, params)
    if len(cids) == 0:
        print("Failed to generate conformations for molecule. Skipping.")
        return None, None
    molecule = Chem.RemoveHs(molecule)
    # Calculating Partial Charges with AM1-BCC
    
    openff_mol = Molecule.from_rdkit(molecule,
                                     allow_undefined_stereo=True,
                                     hydrogens_are_explicit=True)
    for conformer in openff_mol.conformers:
        openff_mol.assign_partial_charges(
            partial_charge_method = 'am1-mulliken',
            use_conformers = conformer
        )
        all_partial_charges.append(openff_molecule.partial_charges)

    return all_partial_charges
rdkit_mol = Chem.MolFromSmiles('COC(=O)C(C1CCCCN1)C2=CC=CC=C2')
pc_list = get_conformations(rdkit_mol)

Output

  ---------------------------------------------------------------------------
  ValueError                                Traceback (most recent call last)
  Cell In[203], line 1
  ----> 1 means, poses, mol = get_conformations(mols)
  
  Cell In[199], line 33, in get_conformations(molecules)
       29 openff_mol = Molecule.from_rdkit(molecule,
       30                                  allow_undefined_stereo=False,
       31                                  hydrogens_are_explicit=True)
       32 start = time.time()
  ---> 33 openff_mol.assign_partial_charges(
       34     partial_charge_method = 'am1-mulliken',
       35     use_conformers = np.array(positions_lst[0])
       36 )
       37 print('\n')
       38 print(openff_molecule.partial_charges)
  
  File /mnt/evafs/groups/sfglab/mwisniewski/anaconda3/envs/openmm/lib/python3.9/site-packages/openff/toolkit/topology/molecule.py:2677, in FrozenMolecule.assign_partial_charges(self, partial_charge_method, strict_n_conformers, use_conformers, toolkit_registry, normalize_partial_charges)
     2663         warnings.warn(
     2664             f"Warning! Partial charge method '{partial_charge_method}' is not designed "
     2665             "for use on large (i.e. > 150 atoms) molecules and may crash or take hours to "
     (...)
     2669             stacklevel=2,
     2670         )
     2672 if isinstance(toolkit_registry, ToolkitRegistry):
     2673     # We may need to try several toolkitwrappers to find one
     2674     # that supports the desired partial charge method, so we
     2675     # tell the ToolkitRegistry to continue trying ToolkitWrappers
     2676     # if one raises an error (raise_exception_types=[])
  -> 2677     toolkit_registry.call(
     2678         "assign_partial_charges",
     2679         molecule=self,
     2680         partial_charge_method=partial_charge_method,
     2681         use_conformers=use_conformers,
     2682         strict_n_conformers=strict_n_conformers,
     2683         normalize_partial_charges=normalize_partial_charges,
     2684         raise_exception_types=[],
     2685         _cls=self.__class__,
     2686     )
     2687 elif isinstance(toolkit_registry, ToolkitWrapper):
     2688     toolkit_wrapper: ToolkitWrapper = toolkit_registry
  
  File /mnt/evafs/groups/sfglab/mwisniewski/anaconda3/envs/openmm/lib/python3.9/site-packages/openff/toolkit/utils/toolkit_registry.py:280, in ToolkitRegistry.call(self, method_name, raise_exception_types, *args, **kwargs)
      278 for toolkit, error in errors:
      279     msg += f" {toolkit} {type(error)} : {error}\n"
  --> 280 raise ValueError(msg)
  
  ValueError: No registered toolkits can provide the capability "assign_partial_charges" for args "()" and kwargs "{'molecule': Molecule with name '5svy_ligand' and SMILES '[C][N+][C][C][C][C][C]([N][C](=[O])[C]([N][C](=[O])[C]([C][C][C][N][C]([N])=[N+])[N][C](=[O])[C]([C])[N+])[C]([C])[O])[C](=[O])[N][C]([C][C][C]([N])=[O])[C](=[O])[N][C]([C]([C])[O])[C](=[O])[N][C]([C])[C](=[O])[N][C]([C][C][C][N][C]([N])=[N+])[C](=[O])[N][C]([C][C][C][C][N+])[C](=[O])[N][C]([C][O])[C](=[O])[N][C]([C]([C])[O])[C](=[O])[O-]', 'partial_charge_method': 'am1-mulliken', 'use_conformers': array([[-1.11773447e+01,  4.10357643e+00, -8.88736732e-01],
         [-1.15084816e+01,  4.46373642e+00,  4.71618630e-01],
         [-1.20780069e+01,  3.28730461e+00,  1.16325346e+00],
         [-1.24256324e+01,  3.29858823e+00,  2.37815716e+00],
         [-1.25460764e+01,  5.58289209e+00,  4.12319050e-01],
         [-1.22564292e+01,  2.08047344e+00,  4.63477783e-01],
         [-1.28090037e+01,  9.09953823e-01,  1.11628013e+00],
         [-1.18880426e+01, -2.41324935e-01,  1.04160749e+00],
         [-1.21800378e+01, -1.36631590e+00,  1.56728252e+00],
         [-1.42012016e+01,  6.24691101e-01,  5.49249615e-01],
         [-1.41968271e+01,  3.64862105e-01, -8.93642993e-01],
         [-1.55037586e+01,  7.09030286e-02, -1.51316246e+00],
         [-1.62568910e+01, -1.04661472e+00, -1.04941327e+00],
         [-1.70924590e+01, -1.11634194e+00,  7.20119260e-02],
         [-1.72973717e+01, -1.12347913e-01,  8.48313913e-01],
         [-1.77408866e+01, -2.38856753e+00,  3.62128888e-01],
         [-1.06390398e+01, -1.76276860e-01,  3.92824434e-01],
         [-9.63682826e+00, -1.20762718e+00,  2.64602267e-01],
         [-8.29119997e+00, -6.36734042e-01,  5.31035224e-01],
         [-8.17832371e+00,  6.07602908e-01,  7.86365116e-01],
         [-9.71925053e+00, -1.72625403e+00, -1.18492025e+00],
         [-8.77494581e+00, -2.69698226e+00, -1.35434830e+00],
         [-9.70523449e+00, -6.13500336e-01, -2.17448041e+00],
         [-7.08912227e+00, -1.41097292e+00,  5.14972631e-01],
         [-5.82625352e+00, -7.52660805e-01,  7.78679208e-01],
         [-4.98020224e+00, -1.30532655e+00,  1.81640822e+00],
         [-4.40863003e+00, -2.63306866e+00,  1.82189082e+00],
         [-5.21875295e+00, -3.85650983e+00,  1.77487014e+00],
         [-6.21330933e+00, -3.93043845e+00,  2.93598768e+00],
         [-7.02593630e+00, -5.12179886e+00,  2.88835962e+00],
         [-8.32223699e+00, -4.89359835e+00,  2.29526826e+00],
         [-5.14252470e+00, -4.29123494e-01, -5.31852877e-01],
         [-5.67639451e+00, -9.07508977e-01, -1.54622654e+00],
         [-4.00601983e+00,  3.47613629e-01, -6.69432550e-01],
         [-3.33438938e+00,  7.03539637e-01, -1.87763598e+00],
         [-1.90415534e+00,  7.05306724e-01, -1.81010920e+00],
         [-1.20496488e+00,  1.00692385e+00, -2.84761729e+00],
         [-3.95525483e+00,  1.84605421e+00, -2.61752324e+00],
         [-4.11188905e+00,  3.13382571e+00, -1.91635838e+00],
         [-2.96189799e+00,  3.83343257e+00, -1.39198350e+00],
         [-1.74847749e+00,  3.59660725e+00, -1.53556549e+00],
         [-3.19710480e+00,  5.00127518e+00, -5.64359898e-01],
         [-1.11237893e+00,  3.86611015e-01, -6.44963662e-01],
         [ 3.13590510e-01,  4.60181915e-01, -7.28468991e-01],
         [ 9.04481031e-01, -9.34189483e-01, -8.79896922e-01],
         [ 2.53210134e-01, -1.97015583e+00, -7.90603359e-01],
         [ 9.08455760e-01,  1.05757224e+00,  5.31138904e-01],
         [ 5.58346998e-01,  2.41729913e-01,  1.62298149e+00],
         [ 3.74852860e-01,  2.43250414e+00,  8.01003796e-01],
         [ 2.33248224e+00, -9.75432838e-01, -1.14105657e+00],
         [ 2.96184467e+00, -2.28429296e+00, -1.25638514e+00],
         [ 4.30447602e+00, -2.31733989e+00, -6.07700395e-01],
         [ 4.67287827e+00, -3.49866696e+00, -2.98665296e-01],
         [ 3.15267019e+00, -2.47103306e+00, -2.78486427e+00],
         [ 5.01888839e+00, -1.16461878e+00, -3.97645577e-01],
         [ 6.34748568e+00, -1.09629165e+00,  2.45199056e-01],
         [ 7.06642764e+00, -4.89782543e-02, -4.64707282e-01],
         [ 6.55821951e+00,  4.58120864e-01, -1.55776858e+00],
         [ 6.92223029e+00, -2.49489225e+00,  7.47057723e-02],
         [ 8.32272988e+00, -2.67995586e+00,  6.53897835e-01],
         [ 8.62781435e+00, -4.16991322e+00,  3.90919636e-01],
         [ 9.87651598e+00, -4.59532729e+00,  9.12415952e-01],
         [ 1.11410322e+01, -4.20676425e+00,  4.40361295e-01],
         [ 1.13259545e+01, -3.41118615e+00, -5.43717748e-01],
         [ 1.23095175e+01, -4.71498760e+00,  1.09520933e+00],
         [ 8.30732966e+00,  5.35921892e-01, -1.29298934e-01],
         [ 8.87387943e+00,  1.58295663e+00, -1.02382940e+00],
         [ 1.02846981e+01,  1.33848630e+00, -1.18521661e+00],
         [ 1.08121482e+01,  1.74306853e+00, -2.29050517e+00],
         [ 8.46162638e+00,  2.90676139e+00, -4.35621148e-01],
         [ 8.82426416e+00,  4.18101514e+00, -1.02578122e+00],
         [ 1.02225220e+01,  4.63628161e+00, -1.13706648e+00],
         [ 1.09134775e+01,  4.73890348e+00,  2.19570024e-01],
         [ 1.22886539e+01,  5.20566392e+00, -4.04851918e-03],
         [ 1.11253619e+01,  7.06212499e-01, -2.26797446e-01],
         [ 1.25375624e+01,  5.85894776e-01, -6.03703594e-01],
         [ 1.32071753e+01, -2.84800997e-01,  3.52395021e-01],
         [ 1.24822881e+01, -1.00940228e+00,  1.11630491e+00],
         [ 1.31574856e+01,  1.99430379e+00, -5.61725478e-01],
         [ 1.44627105e+01,  1.90194990e+00, -1.00332130e+00],
         [ 1.45962291e+01, -4.23987806e-01,  5.57900955e-01],
         [ 1.52821361e+01, -1.26790904e+00,  1.50342919e+00],
         [ 1.49344836e+01, -7.80655219e-01,  2.90579939e+00],
         [ 1.58639626e+01, -2.18864465e-02,  3.57500735e+00],
         [ 1.67703915e+01, -1.10100738e+00,  1.28912164e+00],
         [ 1.70386253e+01,  2.76113592e-01,  1.46199899e+00],
         [ 1.72521939e+01, -1.52231923e+00, -6.08733303e-02],
         [ 1.38206211e+01, -1.10658659e+00,  3.34794179e+00]]), 'strict_n_conformers': False, 'normalize_partial_charges': True, '_cls': <class 'openff.toolkit.topology.molecule.Molecule'>}"
  Available toolkits are: [ToolkitWrapper around The RDKit version 2023.03.3, ToolkitWrapper around AmberTools version 22.5, ToolkitWrapper around Built-in Toolkit version None]
   ToolkitWrapper around The RDKit version 2023.03.3 <class 'openff.toolkit.utils.exceptions.ChargeMethodUnavailableError'> : partial_charge_method 'am1-mulliken' is not available from RDKitToolkitWrapper. Available charge methods are {'mmff94': {}, 'gasteiger': {}}
   ToolkitWrapper around AmberTools version 22.5 <class 'openff.toolkit.utils.exceptions.InvalidConformerError'> : molecule.add_conformer given input of the wrong shape: Given (3,), expected (88, 3)
   ToolkitWrapper around Built-in Toolkit version None <class 'openff.toolkit.utils.exceptions.ChargeMethodUnavailableError'> : Partial charge method "am1-mulliken"" is not supported by the Built-in toolkit. Available charge methods are {'zeros': {'rec_confs': 0, 'min_confs': 0, 'max_confs': 0}, 'formal_charge': {'rec_confs': 0, 'min_confs': 0, 'max_confs': 0}}

When you call out to a partial charge assignment method here:

openff_mol.assign_partial_charges(
            partial_charge_method = 'am1-mulliken',
            use_conformers = conformer
        )

the use_conformers argument needs to be like a list[Quantity] but, since you unwrapped Molecule.conformers, it's just a single quantity. Using a one-length list should wire it through correctly.

I stripped this example down to show what this could look like, just using the toolkit instead of interacting with RDKit directly:

import numpy
from openff.toolkit import Molecule

molecule = Molecule.from_smiles(
    "COC(=O)C(C1CCCCN1)C2=CC=CC=C2",
    allow_undefined_stereo=True,
)

molecule.generate_conformers(n_conformers=10)

all_partial_charges = list()

for conformer in molecule.conformers:
    molecule.assign_partial_charges(
        partial_charge_method="am1bcc",
        use_conformers=[conformer],
    )

    all_partial_charges.append(molecule.partial_charges)

# default conformer charge generation settings may not generate the same
# number of conformers as requested (10) so just compare to what's there
assert len(all_partial_charges) == molecule.n_conformers

# sanity check, standard deviation of partial charges across conformers
# should be low but non-zero
print(numpy.asarray(all_partial_charges).std(axis=0))
"""
[0.00165719 0.0152737  0.00463931 0.01361707 0.00414403 0.00997942
 0.01345663 0.00157    0.01386034 0.01118281 0.01591509 0.0107015
 0.00670867 0.0040332  0.0019573  0.0040332  0.00670867 0.00119447
 0.00119447 0.00119447 0.00651872 0.01836844 0.00434273 0.00434273
 0.00100577 0.00100577 0.00194082 0.00194082 0.01052322 0.01052322
 0.00717461 0.00591086 0.00129767 0.00175138 0.00129767 0.00591086]
"""

A couple of other things I'd like to point out:

  • Your comment is about using AM1-BCC, but you call out to "am1-mulliken", which is AM1 without the BCCs. This may not be what you expect
  • Note that in its design an OpenFF Molecule object can store many conformers but only one set of partial charges. I think you've already picked up on this, since you're storing the partial charges of each conformer in a separate list.

@j-wags this bubbles up here, where it's never quite explicitly checked that use_conformers isn't a Quantity itself, so this iterator can incidentally iterate through each row in a conformer array when it thinks it's iterating through conformers:

for conformer in use_conformers:
mol_copy._add_conformer(conformer)

Sorry that I missed this initially @maciejwisniewski-drugdiscovery - I've updated my notification settings so that it won't happen again.

@mattwthompson's response and proposed solution are 100% correct. Please let us know if his proposed solution fixes your use case.

Thanks for your response! I appreciate your help. I have another problem, but I'll open a separate thread for that.