SINGROUP/dscribe

SOAP descriptors of different species too similar?

ryokbys opened this issue · 8 comments

Hi,
It's a great project.
I tried to create SOAP descriptors of a crystalline system including a few chemical species. But the obtained SOAP descriptors of other species seem too close. Attached is an example of Mg2SiO4 and the plot at the end of the notebook shows very close SOAP values for Mg, Si, and O species. I am not very sure whether it is wrong, but if I make SOAP descriptors with a different program values for the species are much different. Could you tell me if something wrong with my way of creating SOAP?

Versions

  • dscribe 0.4.0
  • ase 3.20.1

structure
スクリーンショット 2021-02-12 11 04 58

$ cat POSCAR
Mg8 Si4 O16
1.0
4.802840 0.000000 0.000000
0.000000 6.047893 0.000000
0.000000 0.000000 10.323355
Mg Si O
8 4 16
direct
0.000000 0.000000 0.000000 Mg
0.500000 0.000000 0.500000 Mg
0.000000 0.500000 0.000000 Mg
0.500000 0.500000 0.500000 Mg
0.008375 0.750000 0.277264 Mg
0.508375 0.250000 0.222736 Mg
0.991625 0.250000 0.722736 Mg
0.491625 0.750000 0.777264 Mg
0.925974 0.750000 0.593638 Si
0.425974 0.250000 0.906362 Si
0.074026 0.250000 0.406362 Si
0.574026 0.750000 0.093638 Si
0.276946 0.466986 0.837165 O
0.776946 0.533014 0.662835 O
0.723054 0.966986 0.162835 O
0.223054 0.033014 0.337165 O
0.723054 0.533014 0.162835 O
0.223054 0.466986 0.337165 O
0.276946 0.033014 0.837165 O
0.776946 0.966986 0.662835 O
0.777851 0.750000 0.446782 O
0.277851 0.250000 0.053218 O
0.222149 0.250000 0.553218 O
0.722149 0.750000 0.946782 O
0.265961 0.750000 0.591677 O
0.765961 0.250000 0.908323 O
0.734039 0.250000 0.408323 O
0.234039 0.750000 0.091677 O

Hi @ryokbys!

I don't see anything fundamentally wrong with your current setup. In the context of the original definition for SOAP (https://doi.org/10.1103/PhysRevB.87.184115), there are a few parameters you can tune if you are interested in tuning SOAP to be more sensitive to the sites:

  • Reduce rcut
  • Try crossover=False

Also, if you are using these descriptors in a machine learning environment, the learning algorithm hyperparameters should adjust quite well to detecting even smaller changes.

There are some ideas for improving the locality of SOAP that go beyond the original definition, see issue #20. Out of curiosity, what other code are you using, and which parameters are you using?

Thanks @lauri-codes for your quick response.

The reason why asked this is because I got the following PCA map of SOAP features obtained from polycrystalline 4-component system using DScribe, which was much different from I expected. I expected that points of each species make separate clusters. But this could be true...
PCA_SOAP_poly

Actually the PCA map obtained from crystalline of the same 4-component system, I got more reasonable-looking(?) one like the following.
PCA_SOAP_crystal

The one I used to compare SOAP descriptor is a fortran wrapper routine that calls GAP code in QUIP. But unfortunately it is not scalable to large system, that's why I am trying DScribe. The SOAP graph obtained by it with almost the same configuration looks more different between species, but it could also be the magic of ordering of SOAP index or something.

Anyway, thanks for your advices.

Hi @ryokbys!

Coming back to this issue in preparation for version 1.1.0: we will add a new weighting option for SOAP that will greatly increase the local resolution. Here is an example script of using it (the structure is here: forsterite.zip):

import numpy as np
import matplotlib.pyplot as mpl
from ase.io import read
from dscribe.descriptors import SOAP

atoms = read("./forsterite.poscar")
atoms.wrap()

soap = SOAP(
    species=["Mg", "Si", "O"],
    periodic=True,
    weighting={"function": "poly", "r0": 6, "c": 1, "d": 1, "m": 4},
    nmax=4,
    lmax=3,
    crossover=True,
)
feat = soap.create(atoms)
fix, ax = mpl.subplots(figsize=(10, 5))
for index in [0, 8, 24]:
    i_feat = feat[index]
    x = np.arange(0, i_feat.shape[0])
    mpl.plot(x, i_feat, label=atoms.get_chemical_symbols()[index])
mpl.legend()
mpl.tight_layout()
mpl.show()

This produces the following plot for your structure:
forsterite

As you can see, by introducing a radial weighting that monotonuously decreases towards zero the further you go from a center (this is controlled by the new weighting-dictionary), you can remedy the problems discussed in issue #20. There are several parameters that you can use to tune the weighting to fit your needs. We are preparing version 1.1.0 for release, but if you want, you can already play with it by cloning branch 1.1.0 from git and installing it with

cd dscribe
pip installl .

Closing this for now, please see the previous comment.