rdkit/rdkit

Can't kekulize mol

UnixJunkie opened this issue · 13 comments

Using the following script:

#!/usr/bin/env python2

# output the MACCS bitstring of each molecule found in a MOL2 file

import rdkit.Chem
import sys

def RetrieveMol2Block(fileLikeObject, delimiter="@<TRIPOS>MOLECULE"):
    """generator which retrieves one mol2 block at a time
    """
    mol2 = []
    for line in fileLikeObject:
        if line.startswith(delimiter) and mol2:
            yield "".join(mol2)
            mol2 = []
        mol2.append(line)
    if mol2:
        yield "".join(mol2)

import sys
from rdkit.Chem import MACCSkeys
with open(sys.argv[1]) as in_file:
    problem_mols = open('problem.mol2', 'w')
    for mol2 in RetrieveMol2Block(in_file):
        mol = rdkit.Chem.MolFromMol2Block(mol2)
        try:
            maccs = MACCSkeys.GenMACCSKeys(mol)
            for bit in maccs:
                if bit:
                    sys.stdout.write('1')
                else:
                    sys.stdout.write('0')
            sys.stdout.write('\n')
        except:
            sys.stdout.write('0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\n')
            problem_mols.write(mol2)

and rdkit-Release_2016_03_1; I got all the following molecules in error:
https://gist.github.com/UnixJunkie/c8c500f9b18d80daf59d0990c8bc964e

Is this a known problem ?
Is my bug report incomplete or incorrect in some way ?

Just busy and haven't had a chance to take a look at it.

OK. For the moment, I will just ignore those molecules.
I hope they will be managed in future versions of rdkit.

One piece of information that would really help: which piece of software produced the mol2 files?

Conformer generation was performed with omega (from openeye).

And that wrote the mol2 file?

I hope so.

Hmm. Let me think; maybe there was an additional pass with open babel to ensure the partial charges were Gasteiger ones.

The reader is not super robust. It really expects that the input files have atom types that match what Corina produces. There's a bit of documentation of that here: http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolfiles-module.html#MolFromMol2File

If possible, you will have much better luck creating molecules from a Mol (or SDF) file

Thanks for the tip.
Maybe the error message should be more explicit (which atom type was not understood
or even the full problematic atom line from the MOL2 file).
This would give users a chance to fix their input file.

That would indeed be nice, but it doesn't know that. It just knows that a ring was encountered that it could not kekulize.

Having some reporting that tells you which ring had the problem may help some.

Now that the error reporting has been improved (at least I think so), I'm closing this.
Please re-open if necessary.