openff.toolkit.Molecule unable to parse molecule in .sdf file, but is ok with parsing from rdkit
David-Araripe opened this issue · 4 comments
I have a .sdf
file that I want to parse using openff.toolkit.Molecule
. In general, I don't have problems with reading molecules like this.
However, this was an exception. When trying to read the attached .sdf
file with the method Molecule.from_file(fpath, allow_undefined_stereo=True)
, I got the following logging message:
[07:20:59] Explicit valence for atom # 2 O, 3, is greater than permitted
68 Explicit valence for atom # 2 O, 3, is greater than permitted
Followed by the error:
in FrozenMolecule.from_file(cls, file_path, file_format, toolkit_registry, allow_undefined_stereo)
3912 mols = toolkit.from_file_obj(
3913 file_obj,
3914 file_format=file_format,
3915 allow_undefined_stereo=allow_undefined_stereo,
3916 _cls=cls,
3917 )
3919 if len(mols) == 0:
-> 3920 raise MoleculeParseError(f"Unable to read molecule from file: {file_path}")
3921 elif len(mols) == 1:
3922 return mols[0]
MoleculeParseError: Unable to read molecule from file: 68_27_aligned.sdf
Solves it for the moment
A workaround that I found to work is loading the molecule by doing:
fpath = "68_27_aligned.sdf"
supplier = Chem.SDMolSupplier(fpath)[0]
molecule = Molecule.from_rdkit(supplier)
Is this the desired behavior from the Molecule
class? If so, is there some parameter I can pass so that I can still read he molecule file using the from_file()
method?
To reproduce the error & get the version I'm using
from openff import toolkit
from openff.toolkit
toolkit._version.get_versions()
fpath = "68_27_aligned.sdf"
molecule = Molecule.from_file(fpath, allow_undefined_stereo=True)
Where the version details are:
{'date': '2024-04-12T12:59:25-0700',
'dirty': False,
'error': None,
'full-revisionid': '3bb4d9faa0cb481770c0974a845581f98629fb45',
'version': '0.16.0'}
Forgot the send the file, sorry:
68_27_aligned.txt
I don't think you're doing anything wrong; I can load it using the OEChem backend and it looks like a sensible molecule in Avogadro as well. The toolkit internally has a different way of going through to the SDF supplier which is probably the cause of this trivalent oxygen you're seeing (?)
openff-toolkit/openff/toolkit/utils/rdkit_wrapper.py
Lines 1074 to 1076 in f282282
Specifically here's where the failure bubbles up from:
openff-toolkit/openff/toolkit/utils/rdkit_wrapper.py
Lines 1019 to 1024 in f282282
I don't know why this is happening - Jeff, Josh, or somebody else may drop by and actually answer your question 😅
Indeed, there was something wrong with the molecule I sent. I added hydrogens at some point of processing, but I noticed that was wrong in the end, as my molecules had an "empty" valence configuration. Once I corrected this issue, I didn't see this bug anymore! Thank you very much for having a look into this!
If it's something you'd like to further investigate, fine, but from my side I'm fine with having this issue closed
Thanks for writing in and for the update, @David-Araripe. I'll mark this as closed.