jensengroup/xyz2mol

sdf file is empty while converting from xyz file

brunocalcada opened this issue · 12 comments

Hi all,

Trying to use xyz2mol to convert xyz files to sdf, however the result is an empty file without any error whatsoever.

Annexed you can find the molecule that i want to convert (it is a txt so i could annex here) and I'm using the following line directly in the terminal:
xyz2mol molecule_1.xyz -o sdf > molecule_1_from_xyz.sdf

xyz2mol version: 0.1.2

Thank you in advance :)

molecule_1.txt

I recommend you set up a virtual environment for Python and use the attached requirements.txt to resolve the dependencies. Different to the file available here, it contains the entry rdkit (my PR filed in May here still is pending an a reply by the maintainers, but so are PRs by others, too). Though I don't remember a change of the xyz2mol.py, it equally is added. (To bypass the upload hurdle, there is an additional .txt; at your place, removed it.)

With the requirements resolved this possibly works for you, as it did for me by

$ python ./xyz2mol.py ./molecule_1.txt -o sdf > molecule_1.sdf

calling for Python 3.11.4 and RDKit (version 2023.03.1) -- both provided by the repositories of Linux Debian 13/trixie, branch testing.

molecule_1.sdf.txt
requirements.txt
xyz2mol.py.txt


Two side notes:

  • Based on exchange around an other issue report here, the OP reorganized the code to use a main function (see this branch). Simultaneously, his requirements.txt is more explicit (rdkit>=2019.9.1).

  • Chemically speaking, the structure you convert into a .sdf is very flat. If it isn't extracted from a crystal structure where neighbouring packing molecules can restrain conformational flexibility quite a bit, I would assume other conformations of the biaryl amine to be energetically more favorable.

xyz2mol is now implemented in RDKit (see the readme file), so I don't really maintain xyz2mol actively. But do let me know if the RDKit implementation is not working for you.

@jhjensen2 The additional entry rdkit in the list of requirements of xyz2mol appears as necessary to work with the utility experienced as very helpful if one does not want to enter further into the details of RDKit. This is why I considered the addition a useful correction -- whatever lead to the omission, the omission was accidental.

Independent from this, I wrote a new moderator script xyz2mol_2.py -- which for now only provides the conversion to .sdf -- to interact with the current version of RDKit (2023.03.2) available via pip. RDKit as currently packaged by DebiChem for Linux 13/Trixie (and other distributions which benefit this effort, e.g. Ubuntu) is version 2022.09.3 (tracker); here, your approach with an amended dependency rdkit works.

@brunocalcada If your setup allows to set up a virtual environment to fetch current RDKit 2023.03.2, the attached moderator script may be helpful for you.

xyz2mol_b.zip

@nbehrnd thank you so much for your reply. After some testing it worked perfectly, even in the environment that I already had with rdkit.
@jhjensen2 At first I tried to use rdkit to make this concersion but it did not work for me and that's why I searched for an alternative and found xyz2mol. As soon as possible I will try with the script that you shared in the previous comment to see if it works.
Thank you both for your help :)

The same for me.

1)empty file if use installed xyz2mol via conda
2)error if call local xyz2mol.py

% python xyz2mol.py train_mixedT.xyz -o sdf > save_file.sdf
Traceback (most recent call last):
  File "/Users/name/repo/xyz2mol/xyz2mol.py", line 795, in <module>
    atoms, charge, xyz_coordinates = read_xyz_file(filename)
  File "/Users/name/repo/xyz2mol/xyz2mol.py", line 550, in read_xyz_file
    atomic_symbol, x, y, z = line.split()
ValueError: too many values to unpack (expected 4)

xyz file about 1.5 MB - contains more than 4 columns:

C       -1.44125098       1.67886973       3.63186498       1.40125928       1.53133181      -0.53998390

p.s.
Can't reproduce with small simple xyz file with 4 columns:

C 0.0368238718 1.3887832033 0.1077864027

@jhjensen2 thank you for your feedback previously.

In the last days I'm exploring the RDKit functions to what my goal is: convertion of an xyz to sdf with all the info.
I'm doing this because xyz2mo.py has some difficulties with xyz files generated after optimization.

When you say that xyz2mol is already implemented in RDKit you mean through the MolFromXYZFile as mentioned in the ReadMe file?
I'm exploring that option but I would like my sdf file to also have the bonds information.
I will leave here an example of what I was able to do with the xyz2mol.py script. With RDKit I'm only able to have coordinates and the stereo info. Do you have any idea how to also add the bond information into the final sdf file with RDKit or that would not be important if I use the MolFromXYZFile and coordinates and stereo info is enough?

Example generated through xyz2mol.py:
molecule_1.zip

Thank you in advance for any feedback :)

@brunocalcada If you want RDKit to add the bonds, the first snippet of code is the show case how to interact with RDKit, i.e.

raw_mol = Chem.MolFromXYZFile('acetate.xyz')
mol = Chem.Mol(raw_mol)
rdDetermineBonds.DetermineBonds(mol,charge=-1)

I read your .sdf file about 3-methyl-1H-benzimidazole-2-thione. It is written in the elder V2000 dialect and already contains both atomic coordinates (atom block) and bond information (bond block; here: line 24 is the first line to watch): first atom, second atom (both represented by their atomic index assigned), and corresponding bond information (e.g., bond order 1, 2, 3).

Wikipedia briefly compiles both the v2000 and v3000 standard syntax here. If you want to delve deeper into the syntax, the page equally links to the pdf on archive for the detailed documentation by Biovia here, a document which was fetched in August 2021.

sdf_read_by_avogadro

@nbehrnd thank you so much for your reply.
To be honest I'm quite more familiar with the v2000, it is the format that I'm already used too. Nonetheless, in the future I will explore the v3000 it seems similar but with some minor differences in the appearance of the files.

Perhaps it is easier if I share the structure where I'm facing some issues.
I already tried to convert it to sdf with xyz2mol and I'm trying to work around the RDKit options to generate the sdf file however without any success. I'm able to visualize it without any problems with software like pymol but when trying to use xyz2mol to generate the sdf it's not able too. My data set is around 1k chemicals and I'm facing this problem in my workflow in around 10.

Any insight and guidance is much appreciated, thank you.

molecule_2.zip

Your molecule works fine for me. Are you using a +1 charge?

@jhjensen2 thanks for answering.
I just found out where my problem was and it was in the convertion for the sdf format.

Considering that you mentioned the charge, I did try with it and without it and the strcuture generated is exactly the same.
It does not seem to be mandatory to specify as attribute, do you think that may have any impact in generated structures?

Perhaps not in my case because I don't do any step in between the conversion from xyz to sdf (e.g. Add Hs or optimization).

@brunocalcada I agree with you on the part that the current appetizing snippet of code is a bit short. To ease the initial use, I filed PR #42 a couple of minutes ago. This equally provides a link to the corresponding RDKit blog post with additional examples of application.

With this edited script, both the reference file about the acetate anion, as well as your test structure (I presume equally charged -1) were processed smoothly. But the functionality already was available earlier, not "just" by RDKit 2023.09.5 fetched and used now/today.