mcs07/PubChemPy

Errors in the encoding of some bits

Opened this issue · 1 comments

Hi,
I was checking the code behind the PubChem fingerprint generation.
I did some comparisons between fingerprints calculated with your code and those calculated with PyFingerprint which uses the cdk library and noticed some differences.
I noticed that for bits in the range 0-98, smarts are not used and therefore when carbons are counted for example, only aliphatic carbons are considered since the corresponding key is C.
As a result the counting and encoding are incorrect.
The second point concerns the bits in the range 115-231: in this case there are two conditions to be met such as bits 116 and 117 mention ">= 1 saturated or aromatic carbon-only ring size 3 " and ">= 1 saturated or aromatic nitrogen-containing ring size 3" respectively. In this case a cyclopropane ring should be detected by bit 116 but not by bit 117. Instead with your code it is encoded for both bits.

I hope the bugs I reported are corrected otherwise I would be glad to have an explanation of my mistake

Thank you for your helpfulness
Salvatore