mcocdawc/chemcoord

change the origin

carltta opened this issue · 11 comments

The program per default is setting the origin of the Z matrix to the atom closest to the geometric center of the molecule. However, I would like to chose the atom considered as origin, since in my case it doesn’t correspond always to the geometric center of the molecule.
Do you have any suggestions how I could resolve it?

It is indeed no problem to define yourself the beginning of a Z-matrix and let chemcoord figure out the rest.

The method that you need is get_construction_table, which calculates the construction table of a Z-matrix.¹
The crucial part is that you can pass incomplete construction tables to this method and it will figure out the missing pieces.
The obtained construction table can then be passed to get_zmat which actually calculates the bond distances, angles, and dihedrals to obtain the Z-matrix.

There is a section about the definition of the construction table in the tutorial here.

If you look for the MIL53_small.xyz structure in chemcoord/Tutorial you can also try the following code:

import chemcoord as cc
import pandas as pd

m = cc.Cartesian.read_xyz('./MIL53_small.xyz', start_index=1)

c_table = pd.DataFrame(
             [[11, 'origin', 'e_z', 'e_x'],
              [7, 11, 'e_z', 'e_x'],
              [6, 7, 11, 'e_x']], columns=['i', 'b', 'a', 'd']).set_index('i')

# This Z-matrix starts with oxygen
m.get_zmat()

# This Z-matrix starts with chromium
m.get_zmat(m.get_construction_table([(m, c_table)]))

¹ A construction table is just the table of indices which defines which atom uses which atoms as reference. It is basically the ['b', 'a', 'd'] columns of a Z-matrix.

Great, thanks a lot!

How did you choose the indexes in the c_table? Because 11 is the index of the atom we want as origin, but 6 and 7?

This was just an example where I used the order Cr - O - Cr, where O is the bridging oxygen.
It made chemically the most sense to me, but you can choose differently.

I think¹ that you are forced to fill in at least the first three rows, i.e. the rows which are special in that they need references to absolute points in space.

¹ Very long time ago since I wrote that function.

I still have a problem.
I have a xyz file that contains 38 atoms.
When I run m.get_zmat() I obtain a Zmatrix with the default origin but containing all the 38 atoms.
When I run m.get_zmat(m.get_construction_table([(m, c_table)])) the origin is placed where I want but the resulting matrix has only 14 atoms.

I have also tried to split the .get_zmat function in the following way:

import chemcoord as cc

filexyz = cc.Cartesian.read_xyz('filename.xyz', start_index=0)
fileconstrtable = filexyz._get_frag_constr_table(start_atom=1)
matrix = filexyz.get_zmat(construction_table=fileconstrtable)

but there are two issues:

  1. the function _get_frag_constr_table generates a table with only some atoms present in the xyz, actually exactly 14, defining this stable as the starting point for the .get_zmat function generates a matrix with only 14 atoms.
  2. we should use the function get_construction_table to "connect" _get_frag_constr_table with get_zmat, but I don't know how this could be possible.

Hmm, that sounds like a bug or misuse if I understand it correctly from your description.¹
Could you attach the xyz file and your python code so that I can reproduce it.
Note that if your xyz file consists of more than one fragment, then _get_frag_constr_table is supposed to only create the Z-matrix for one fragment. The Z-matrices for each fragment are then concatenated by code outside _get_frag_constr_table.

¹ Fully noting, that if my code can be easily misused, then it is equally buggy. :-)

I cannot upload the xyz file, so I just changed the format of the .xyz to a .txt.
Core.txt

I tried with the code that you suggested and I tried with this code:

import chemcoord as cc

filexyz = cc.Cartesian.read_xyz('Core.xyz', start_index=0)

fileconstrtable = filexyz._get_frag_constr_table(start_atom=1)

matrix = filexyz.get_zmat(construction_table=fileconstrtable)
print(matrix)

If you look at the output of filexyz.fragmentate() you will see that chemcoord does not detect a lot of the bonds in the system, i.e. it fragmentates into several single atom "molecules".
This is because a bond is assumed to be present if the distance between two atoms is smaller/equal than the sum of their radii. (Whatever radius you use as a definition.)
This was the conceptional problem.

The technical problem is that I really would advise against using _get_frag_constr_table directly. It operates only on single fragments¹ and does not yet return you valid construction table for your whole system.
So this code is actually bound to fail.

The clean way way how to cure these problems is to define a larger radius for oxygen or zirkonium.
If I do:

cc.constants.elements.loc['O', 'atomic_radius_cc']
cc.constants.elements.loc[:, 'old_atomic_radius_cc'] = cc.constants.elements.loc[:, 'atomic_radius_cc']
cc.constants.elements.loc['O', 'atomic_radius_cc'] = 0.8

I can do

molecule = cc.Cartesian.read_xyz('molecule.xyz', start_index=0)

guess_c_table = pd.DataFrame(
             [[1, 'origin', 'e_z', 'e_x'],
              [8, 1, 'e_z', 'e_x'],
              [2, 8, 1, 'e_x']], columns=['i', 'b', 'a', 'd']).set_index('i')

c_table = molecule.get_construction_table([(molecule, guess_c_table)])
molecule.get_zmat(c_table)

and get a meaningful Z-matrix with 38 atoms.

I think a warning in the get_zmat method about unreasonably small fragments would help a lot. Or what would have helped you there?

It makes sense. Thanks a lot!
Here some additional comments:

  • does the defined length of the radius of oxygen influence the Z matrix? or can I just define it with an arbitrarily large value so that all the atoms in the XYZ are taken into account in the Z matrix?

  • I think you get the same Z matrix if in the guess_c_table only the origin is defined. This allows the code to be more general.

guess_c_table = pd.DataFrame(
             [[1, 'origin', 'e_z', 'e_x'],
             columns=['i', 'b', 'a', 'd']).set_index('i')
  • yes if the program makes you aware that you are not converting the entire xyz in Z matrix would help. For example, if the length of the Z matrix and the length of XYZ are different then give an error or something like that.
    Maybe I would also add such an example in the tutorial, where guess_c_matrix is specified in get_construction_table. in the code where you define the get_construction_table function is not super easy to get to such a starting table.

  • something more general. For simplicity, I think that the atom chosen as the origin of the Z matrix should be positioned at cartesian coordinates xyz (0, 0, 0) in the XYZ file, and only after that the Z matrix should be calculated.
    Now, why is the angle of the first atom listed after the origin not 0? As well as the dihedral angles of the first and second atoms after the origin?

  • The chemically motivated Z-matrix is determined by the bonds in the molecule,
    distances should follow bonds, angles should use high valency atoms in the 2n coordination sphere etc.
    If you increase the radius to arbitrarily large values so that everything is connected to everything, you will get a Z-matrix that is chemically junk. (You can just try it with something that has an obvious nice Z-matrix like Propane.)

  • Good point, but it depends on what you want for you system.

  • Now the big question for me is: Did you get an incomplete Z-matrix also when you used get_construction_table? If yes, can you send me the code? This would definetely be a bug. If this incomplete Z-matrix was only obtained via _get_frac_cons... then this is actually expected behaviour.

  • Unfortunately I disagree with this statement. First and foremost I wanted the coordinate transformations between cartesian and Z-matrix space to have certain invariance properties, that I expect from a coordinate transformation in general. If x is a molecule in cartesian coordinates, f is the transformation to Z-matrix coordinates and g is the transformation to cartesian coordinates I wanted to preserve:

x == g(f(x))

This would not be the case if the first atom is arbitrarily placed at (0, 0, 0).
Indeed the upper right triangle of chemcoord Z-matrices is often omitted in other Z-matrices and gives you the absolute orientation in space. The bond, angle and dihedral for the first atom are then just spherical coordinates with respect to absolute points in space. You can translate, rotate your molecule and see how the values in the upper right triangle change.

thanks a lot for the clarification and all the help!

No I don't think there is a bug, when running the function get_construction_table I get a complete matrix. It is incomplete only when get_zmat is initiated with a construction table obtained with _get_frag_constr_table.

Great, happy to hear that. Good luck with your project.