Can't kekulize mol
UnixJunkie opened this issue · 13 comments
Using the following script:
#!/usr/bin/env python2
# output the MACCS bitstring of each molecule found in a MOL2 file
import rdkit.Chem
import sys
def RetrieveMol2Block(fileLikeObject, delimiter="@<TRIPOS>MOLECULE"):
"""generator which retrieves one mol2 block at a time
"""
mol2 = []
for line in fileLikeObject:
if line.startswith(delimiter) and mol2:
yield "".join(mol2)
mol2 = []
mol2.append(line)
if mol2:
yield "".join(mol2)
import sys
from rdkit.Chem import MACCSkeys
with open(sys.argv[1]) as in_file:
problem_mols = open('problem.mol2', 'w')
for mol2 in RetrieveMol2Block(in_file):
mol = rdkit.Chem.MolFromMol2Block(mol2)
try:
maccs = MACCSkeys.GenMACCSKeys(mol)
for bit in maccs:
if bit:
sys.stdout.write('1')
else:
sys.stdout.write('0')
sys.stdout.write('\n')
except:
sys.stdout.write('0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\n')
problem_mols.write(mol2)
and rdkit-Release_2016_03_1; I got all the following molecules in error:
https://gist.github.com/UnixJunkie/c8c500f9b18d80daf59d0990c8bc964e
Is this a known problem ?
Is my bug report incomplete or incorrect in some way ?
Just busy and haven't had a chance to take a look at it.
OK. For the moment, I will just ignore those molecules.
I hope they will be managed in future versions of rdkit.
One piece of information that would really help: which piece of software produced the mol2 files?
Conformer generation was performed with omega (from openeye).
And that wrote the mol2 file?
I hope so.
Hmm. Let me think; maybe there was an additional pass with open babel to ensure the partial charges were Gasteiger ones.
The reader is not super robust. It really expects that the input files have atom types that match what Corina produces. There's a bit of documentation of that here: http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolfiles-module.html#MolFromMol2File
If possible, you will have much better luck creating molecules from a Mol (or SDF) file
Thanks for the tip.
Maybe the error message should be more explicit (which atom type was not understood
or even the full problematic atom line from the MOL2 file).
This would give users a chance to fix their input file.
That would indeed be nice, but it doesn't know that. It just knows that a ring was encountered that it could not kekulize.
Having some reporting that tells you which ring had the problem may help some.
Now that the error reporting has been improved (at least I think so), I'm closing this.
Please re-open if necessary.