When I calculated the result of PubChem fingerprint, it was different from that of Padel software. Did anyone notice this problem, or did I make a mistake? Attached is my program

Question

When I calculated the result of PubChem fingerprint, it was different from that of Padel software. Did anyone notice this problem, or did I make a mistake? Attached is my program

Ls94wood opened this issue 2 years ago · 1 comments

mol = Pymolecule.PyMolecule()
mol.ReadMolFromSmile('CCOC1=CC=CC=C1OCCNC@HCC1=CC(=C(OC)C=C1)S(N)(=O)=O')
mol.GetFingerprint(FPName='Pubchem')

the top ten fingerprint calculated by PyBioMed: 0, 0, 0, 0, 0, 0, 0, 0, 0, 1
the top ten fingerpring calculated by padel : 1 1 1 0 0 0 0 0 0 1

Answer 1 · 2023-04-19T09:23:03.000Z

smi_test = "CCOC1=CC=CC=C1OCCN[C@H]CC1=CC(=C(OC)C=C1)S(N)(=O)=O"
mol = rdkit.Chem.MolFromSmiles(smi_test)
molH = rdkit.Chem.AddHs(mol)
pcfp_result = PCFP.calcPubChemFingerAll(molH) # this is a call of the source code for PubChem fingerprint from PyBioMed as a stand alone executable
print(pcfp_result[:10])

top 10 bits: [1, 1, 1, 0, 0, 0, 0, 0, 0, 1]

Thus it turn that your problem is that PyBioMed computed the fingerprint without explicit hydrogens in your mol object.

However, I would point out some other real problems with the PubChem implementation of GetFingerprint(FPName='Pubchem') in this package that leads to massive faulty fingerprints.
==> I'm opening a new issue now...