YuanyueLi/SpectralEntropy

The tools used for calculating the bond difference between molecules for Extended Data Fig. 7

Closed this issue · 2 comments

Thank you for the extensive investigation of 43 similarity scores leading to the conclusion of Spectral entropy outperforms MS/MS dot product similarity. I noticed the one bond difference is defined to classify true or false identification. I am wondering which tools are used for calculating the bond difference between molecules for Extended Data Fig. 7.

Hi,

Sorry for the late reply. You can find an example script for calculating the bond difference between two molecules with the attached example script here: bond_difference.txt. To run the example script, please use the following command:

conda create -n msentropy
conda activate msentropy
conda install -c conda-forge python=3.9 rdkit=2020.03.6
python bond_difference.txt

Please be aware that the script only works with the rdkit version 2020.03.6. Using a different version may result in exceptions for some molecules.

If you have any questions, feel free to let me know.

The example scripts:

from rdkit import Chem
from rdkit.Chem import rdFMCS


def __remove_bonds_in_smarts(mol, smarts):
    removed_bond_number = 0
    sub_atom = mol.GetSubstructMatch(Chem.MolFromSmarts(smarts))
    # print(sub_atom)
    for i in sub_atom:
        all_bonds = mol.GetAtomWithIdx(i).GetBonds()
        for bond in all_bonds:
            atom_1, atom_2 = bond.GetBeginAtomIdx(), bond.GetEndAtomIdx()
            mol.RemoveBond(atom_1, atom_2)
            if atom_1 in sub_atom and atom_2 in sub_atom:
                removed_bond_number += 1
    return mol, removed_bond_number


def calculate_similarity(mol1, mol2):
    mol1 = Chem.rdchem.RWMol(mol1)
    mol2 = Chem.rdchem.RWMol(mol2)
    bond_number_common = 0
    bond_number_mol1 = len(mol1.GetBonds())
    bond_number_mol2 = len(mol2.GetBonds())

    while True:
        # print(Chem.MolToSmiles(mol1))
        # print(Chem.MolToSmiles(mol2))
        res = rdFMCS.FindMCS(
            [mol1, mol2],
            timeout=60,
            threshold=1,
            ringMatchesRingOnly=True,
            completeRingsOnly=True,
            atomCompare=rdFMCS.AtomCompare.CompareElements,
            bondCompare=rdFMCS.BondCompare.CompareOrderExact,
            ringCompare=rdFMCS.RingCompare.StrictRingFusion,
            maximizeBonds=True,
            matchValences=True,
        )
        if res.numBonds == 0:
            break

        common_s = res.smartsString
        # print(common_s)

        mol1, _ = __remove_bonds_in_smarts(mol1, common_s)
        mol2, _ = __remove_bonds_in_smarts(mol2, common_s)
        bond_number_common += res.numBonds
        # print(bond_number_common)

    return {
        "mol1_bond_number": bond_number_mol1,
        "mol2_bond_number": bond_number_mol2,
        "common_bond_number": bond_number_common,
        "bond_difference": (bond_number_mol1 + bond_number_mol2) / 2 - bond_number_common,
        "bond_similarity": (2 * bond_number_common) / (bond_number_mol1 + bond_number_mol2),
    }


if __name__ == "__main__":
    import pprint

    s1 = "InChI=1S/C10H7NO2/c12-10(13)9-6-5-7-3-1-2-4-8(7)11-9/h1-6H,(H,12,13)"
    s2 = "InChI=1S/C10H7NO2/c12-10(13)9-5-7-3-1-2-4-8(7)6-11-9/h1-6H,(H,12,13)"
    mol_a = Chem.MolFromInchi(s1)
    mol_b = Chem.MolFromInchi(s2)
    result = calculate_similarity(mol_a, mol_b)
    print(f"Molecule 1: {Chem.MolToSmiles(mol_a)}")
    print(f"Molecule 2: {Chem.MolToSmiles(mol_b)}")
    pprint.pprint(result)
    print()
    
    s1 = "CC(CCC)c1ccccc1"
    s2 = "CCC(CC)c1ccccc1"
    mol_a = Chem.rdchem.RWMol(Chem.MolFromSmiles(s1))
    mol_b = Chem.rdchem.RWMol(Chem.MolFromSmiles(s2))
    result = calculate_similarity(mol_a, mol_b)
    print(f"Molecule 1: {Chem.MolToSmiles(mol_a)}")
    print(f"Molecule 2: {Chem.MolToSmiles(mol_b)}")
    pprint.pprint(result)

The output looks like:

Molecule 1: O=C(O)c1ccc2ccccc2n1
Molecule 2: O=C(O)c1cc2ccccc2cn1
{'bond_difference': 2.0,
 'bond_similarity': 0.8571428571428571,
 'common_bond_number': 12,
 'mol1_bond_number': 14,
 'mol2_bond_number': 14}

Molecule 1: CCCC(C)c1ccccc1
Molecule 2: CCC(CC)c1ccccc1
{'bond_difference': 1.0,
 'bond_similarity': 0.9090909090909091,
 'common_bond_number': 10,
 'mol1_bond_number': 11,
 'mol2_bond_number': 11}

Thank you for the detailed scripts. It works in my computer.