mcs07/MolVS

Molecule did not standardise overnight

baoilleach opened this issue · 4 comments

Using the default settings, this ChEMBL molecule did not standardise overnight:

OP(=O)(O)[O-].OP(=O)([O-])[O-].[O-]S(=O)(=O)[O-].[Na+].[Na+].[Na+].[Mg+2].[Cl-].[Cl-].[K+].[K+] 2104840

Here are some more. This appears to be different but the same case over and over again:

CN(C)c1cc[n+](cc1)[C-](C=C(C#N)C#N)C(=O)N2c3ccccc3Sc4c2cccc4
CN(C)c1cc[n+](cc1)[C-](C=C(C#N)C#N)C(=O)c2ccc(cc2)C#N
Cc1cc[n+](cc1)[C-](C=C(C#N)C#N)C(=O)c2ccccc2
Cc1cc(c[n+](c1)[C-](C=C(C#N)C#N)C(=O)c2ccc(cc2)OC)C
CCOC(=O)/C(=C/[C-](C(=O)c1ccc(cc1)OC)[n+]2cc(cc(c2)C)C)/C#N
Cc1cc(c[n+](c1)[C-](C=C(C#N)C#N)C(=O)c2cc(c(c(c2)OC)OC)OC)C
CCOC(=O)/C(=C/[C-](C(=O)c1cc(c(c(c1)OC)OC)OC)[n+]2cc(cc(c2)C)C)/C#N
CN(C)c1cc[n+](cc1)[C-](C=C(C#N)C#N)C(=O)c2ccc(cc2)F
CCOC(=O)/C(=C/[C-](C(=O)c1ccc(cc1)F)[n+]2ccc(cc2)N(C)C)/C#N
CN(C)c1cc[n+](cc1)[C-](C=C(C#N)C#N)C(=O)c2ccc(cc2)Cl
CCOC(=O)/C(=C/[C-](C(=O)c1ccc(cc1)Cl)[n+]2ccc(cc2)N(C)C)/C#N
CN(C)c1cc[n+](cc1)[C-](C=C(C#N)C#N)C(=O)c2ccc(cc2)Br
CCOC(=O)/C(=C/[C-](C(=O)c1ccc(cc1)Br)[n+]2ccc(cc2)N(C)C)/C#N
CCOC(=O)/C(=C/[C-](C(=O)c1ccc2c(c1)NC(=O)CO2)[n+]3ccc(cc3)N(C)C)/C#N
CCOC(=O)/C(=C/[C-](C(=O)c1ccccc1)[n+]2ccc(cc2)C)/C#N
COc1ccc(cc1)C(=O)[C-](C=C(C#N)C#N)[n+]2ccc(cc2)OC
CCOC(=O)/C(=C/[C-](C(=O)c1ccc(cc1)OC)[n+]2ccc(cc2)OC)/C#N
COc1cc[n+](cc1)[C-](C=C(C#N)C#N)C(=O)c2ccc(c(c2)OC)OC
CCOC(=O)/C(=C/[C-](C(=O)c1ccc(c(c1)OC)OC)[n+]2ccc(cc2)OC)/C#N
CN(C)c1cc[n+](cc1)[C-](C=C(C#N)C#N)C(=O)c2ccccc2
CCOC(=O)/C(=C/[C-](C(=O)c1ccccc1)[n+]2ccc(cc2)N(C)C)/C#N
Cc1ccc(cc1)C(=O)[C-](C=C(C#N)C#N)[n+]2ccc(cc2)N(C)C
CCOC(=O)/C(=C/[C-](C(=O)c1ccc(cc1)C)[n+]2ccc(cc2)N(C)C)/C#N
CCOC(=O)/C(=C/[C-](C(=O)c1ccc(cc1)C#N)[n+]2ccc(cc2)N(C)C)/C#N
CN(C)c1cc[n+](cc1)[C-](C=C(C#N)C#N)C(=O)c2ccc(cc2)OC
CCOC(=O)/C(=C/[C-](C(=O)c1ccc(cc1)OC)[n+]2ccc(cc2)N(C)C)/C#N
CN(C)c1cc[n+](cc1)[C-](C=C(C#N)C#N)C(=O)c2ccc(cc2)[N+](=O)[O-]
CN(C)c1cc[n+](cc1)[C-](C=C(C#N)C#N)C(=O)c2cc(c(c(c2)OC)OC)OC
CCOC(=O)/C(=C/[C-](C(=O)c1cc(c(c(c1)OC)OC)OC)[n+]2ccc(cc2)N(C)C)/C#N
CN(C)c1cc[n+](cc1)[C-](C=C(C#N)C#N)C(=O)c2ccc3c(c2)NC(=O)CO3
CCc1cc2c(cc1CC)C(=O)[C-](C2=O)C#N.[Na+]

@baoilleach Which rdkit version you used ? I think the first example you used should not end with 2104840, right? I will test other examples.

@mcs07 After a quick checking of logging. I find most of these cases locked in molvs.charge.

The OpenEye way to deal with such molecules is to timeout after 60s (the timeout can be changed via a command-line option).
I.e. if a calculation fails to complete within 60s for one molecule, the molecule is logged as an error, and the software continues to process further molecules.
I think this is a quite smart approach, since you never really know what you will encounter when you work with >100k molecules.
And you don't want the processing to stop just because of one faulty molecule in the big batch that you were intending to process.