lab-cosmo/librascal

Is it possible to train and predict a per-atom property? [question]

UnixJunkie opened this issue · 5 comments

Hello,

I'd like to give a try at librascal and SOAP features (probably using GPR).
While reading librascal/examples/needs_updating/Precition_example.ipynb;
I understand that you train/test using a property for each molecule.

I wonder if it possible to train/test using a property per atom of the molecule;
instead of a property for the whole molecule.

Would you have some example for that?

Thanks a lot,
Francois.

If the related librascal/examples/needs_updating/Precition_example.ipynb could be updated
to work, it would be nice.
I'll try getting an older version of librascal where it can run, in the meantime.

Hi Francois,

Yes, you can use librascal to predict per-atom properties. Basically, you just need to create a Kernel object with target_type=='Atom' (instead of target_type=='Structure'). You can then use the resulting kernel to do GPR in an atom-wise way, somewhat similar to what is shown in Prediction_example.ipynb as you mentioned. The fitting/prediction itself should also be possible with librascal's train_gap_model() function*, with the appropriately created kernel object passed in.

As for the Prediction_example.ipynb notebook, indeed, it is quite out of date; I would recommend instead to follow the zundel_i-PI.ipynb example (but using atom-wise kernels and properties as mentioned above, and without forces/gradients). But thank you for the feedback; we'll see if we can make a more general example that includes training atom-wise properties.

Hope this helps,
Max

[*] note that this function is due to be replaced by a simplified version in #305, though the replacement will still support fitting with per-atom properties.

Thanks for some feedback Max.

I have some more questions:

Is there a recommended tool to save molecules in xyz format?

In case I have a property of interest per-atom, should it also be included in this .xyz file somehow, or should it be provided separately?

Thanks for some feedback Max.

I have some more questions:

Is there a recommended tool to save molecules in xyz format?

librascal is compatible with ASE, so you can use e.g. ase.io.write() and pass it a list of ASE Atoms objects. Another, faster option is chemfiles, but we haven't made librascal explicitly compatible with that yet.

In case I have a property of interest per-atom, should it also be included in this .xyz file somehow, or should it be provided separately?

Either way will work, but for book-keeping and organization it helps to store the data in the XYZ file. You can do this in ASE, for example, by storing numpy arrays into the atoms.arrays dictionary (NB the arrays in this dictionary must have the first dimension equal to the number of atoms in the Atoms object). But to do the fit itself you will need to extract the data again and pass it as a separate argument (y_train) to train_gap_model().

I had to retrieve this old version:

commit 8cb1f3c149528254be414837700c9b60323932f7
Author: musil <felix.musil@epfl.ch>
Date:   Mon Dec 2 15:28:40 2019 +0100

So that examples/SOAP_example.ipynb exists (I hope it is working in this version).