Unable to standardize some PubChem molecules
VladislavChernykh opened this issue · 1 comments
VladislavChernykh commented
Hello,
I was using molvs standardizer on PubChem molecules and found out several molecules that cannot be standardized:
- SMILES: CC(S(=O)CC1=CC=C(C=C1)C(S(=O)CC2=CC=C(C=C2)C(S(=O)CC3=CC=C(C=C3)C(S(=O)C4=CC=C(C=C4)Br)S(=O)C5=CC=C(C=C5)Br)S(=O)CC6=CC=C(C=C6)C(S(=O)C7=CC=C(C=C7)Br)S(=O)C8=CC=C(C=C8)Br)S(=O)CC9=CC=C(C=C9)C(S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)CC1=CC=C(C=C1)C(S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br
Link: https://pubchem.ncbi.nlm.nih.gov/compound/59827358
- SMILES: CC1=CC=C(C=C1)C(S(=O)CC2=CC=C(C=C2)C(S(=O)CC3=CC=C(C=C3)C(S(=O)CC4=CC=C(C=C4)C(S(=O)C5=CC=C(C=C5)Br)S(=O)C6=CC=C(C=C6)Br)S(=O)CC7=CC=C(C=C7)C(S(=O)C8=CC=C(C=C8)Br)S(=O)C9=CC=C(C=C9)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)CC1=CC=C(C=C1)C(S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br
Link: https://pubchem.ncbi.nlm.nih.gov/compound/59827349
Code to reproduce:
from rdkit import Chem
from molvs import Standardizer
smiles = "CC1=CC=C(C=C1)C(S(=O)CC2=CC=C(C=C2)C(S(=O)CC3=CC=C(C=C3)C(S(=O)CC4=CC=C(C=C4)C(S(=O)C5=CC=C(C=C5)Br)S(=O)C6=CC=C(C=C6)Br)S(=O)CC7=CC=C(C=C7)C(S(=O)C8=CC=C(C=C8)Br)S(=O)C9=CC=C(C=C9)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)CC1=CC=C(C=C1)C(S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br"
mol = Chem.MolFromSmiles(smiles)
res = Standardizer().standardize(mol)
It seems that the flow goes into an infinite loop in function _apply_transform() (normalize.py). After 10 minutes of transformation still got no result.
Thanks,
Vladislav
UnixJunkie commented
It might be nice to do this on the whole pubchem, to flag all erroneous molecules at once.