chemosim-lab/ProLIF

Default halogen bond SMARTS ignores carbonyles

asiomchen opened this issue · 0 comments

Recently, I been trying to detect halogen bonds with ProLIF and to my surprise tyrosine is not recognized as acceptor, despite having carbonyl near ligand (Schrodinger's Maestro successfully detects it), it turns out that default SMARTS only matches single bond between atoms in the acceptor, changing original SMARTS from [#7,#8,P,S,Se,Te,a;!+{1-}][*] to [#7,#8,P,S,Se,Te,a;!+{1-}]!#[*]

from rdkit import Chem
from rdkit.Chem import Draw, AllChem
from copy import copy
from IPython.display import display
from itertools import chain

def plot_2D_highlight(mol, matches):
    ligand_2d = copy(mol)
    AllChem.Compute2DCoords(ligand_2d)
    matches = list(chain(*matches))
    if len(matches) == 0:
        display(Draw.MolToImage(ligand_2d, size=(400, 200)))
    else:
        display(Draw.MolToImage(ligand_2d, highlightAtoms=matches, size=(400, 200)))

original_pattern = Chem.MolFromSmarts("[#7,#8,P,S,Se,Te,a;!+{1-}][*]")

# new pattern matches not only a single bond in acceptor but any bond other than a triple bond
# probably we would not see nitrile as acceptor, but this pattern now matches carbonyl, which is acceptor in
# many complexes available in the PDB (according to the Auffinger et al. PNAS 2004 paper)

new_pattern = Chem.MolFromSmarts("[#7,#8,P,S,Se,Te,a;!+{1-}]!#[*]")
histidine = Chem.MolFromSmiles("C1=C(NC=N1)CC(C(=O)O)N")
# nitryl is added to the tyrozine just to show that the new pattern does not match it, as well as the original pattern
tyrozine = Chem.MolFromSmiles("C1=C(C#N)C(=CC=C1C[C@@H](C(=O)O)N)O")

original_pattern_matches_his = list(chain(*histidine.GetSubstructMatches(original_pattern)))
new_pattern_matches_his = list(chain(*histidine.GetSubstructMatches(new_pattern)))
original_pattern_matches_tyr = list(chain(*tyrozine.GetSubstructMatches(original_pattern)))
new_pattern_matches_tyr = list(chain(*tyrozine.GetSubstructMatches(new_pattern)))
Draw.MolsToGridImage([histidine, histidine, tyrozine, tyrozine], molsPerRow=2, subImgSize=(400, 200),
                        legends=["Original pattern", "New pattern", "Original pattern", "New pattern"],
                        highlightAtomLists=[original_pattern_matches_his, 
                                            new_pattern_matches_his, original_pattern_matches_tyr, 
                                            new_pattern_matches_tyr])

obraz
Maybe this SMARTS should become the default?