mir-group/allegro

Model training for StressForceOutput

Opened this issue · 4 comments

Hi,

Thanks for making the Allegro repo. be public. I was just wondering if you have any guidance or though on preparing a config file when training the Allegro that predicts stress tensor outputs as well in addition to forces and total potential energies (StressForceOutput).

I've tried several attempts for my dataset with the following configurations, but losses for stress tensor decreases very slowly and marginally while losses for forces or energies keep decreasing after a certain number of training epochs.

Attempt 1: Applying PerAtomMSELoss
loss_coeffs:
forces: 1.
total_energy:
- 1.
- PerAtomMSELoss
stress:
- 1.
- PerAtomMSELoss

Attempt 2: Assigning more weights to loss for stress tensor predictions
loss_coeffs:
forces: 1.
total_energy:
- 1.
- PerAtomMSELoss
stress: 100.

Attempt 3: Assigning simple MSE loss function for stress tensor
loss_coeffs:
forces: 1.
total_energy:
- 1.
- PerAtomMSELoss
stress: 1.

Otherwise, do you recommend not to add loss for stress tensor?
Any recommendation or guidance when I use Allegro to predict stress tensors, forces, and potential energies would be welcome!

Kind regards,

Hi @wkylee14 ,

Thanks for your interest in our code!

stress should use a normal, and not PerAtom loss. Stress training issues are often linked to incorrect labels, either due to DFT issues, unit conversion issues, or an incorrect sign convention. (We follow the convention stress = (-1 / volume) * virial as discussed in various other threads on the nequip repo: https://github.com/mir-group/nequip/blob/main/nequip/nn/_grad_output.py#L346-L349).

Hi, @Linux-cpp-lisp

I have extxyz file like this:
Lattice="7.749908999 0.0 0.0 3.874954499 6.71161807 0.0 3.874954499 2.237206023 6.3277742" Properties=species:S:1:pos:R:3:forces:R:3 energy=-85.53947668 stress="0.013635762572225896 -0.001867459530464271 -0.00034451257922137816 -0.001867459530464271 0.00040893119490961933 0.003952978665259525 -0.00034451257922137816 0.003952978665259525 -0.0004071773308452461" free_energy=-85.53947668 pbc="T T T"
And I set loss function as follows:

loss_coeffs:
  forces: 1.
  stress: 1.
  total_energy:
    - 1.
    - PerAtomMSELoss

Is this correct?
I find that there is virial instead of stress in https://github.com/mir-group/nequip/blob/main/configs/minimal_stress.yaml#L56C1-L58C10.
Since stress = (-1 / volume) * virial, is there a trick for unit conversion?

After I set loss function as above, the loss is as follows:

  Train      #    Epoch      wal       LR       loss_f  loss_stress       loss_e         loss        f_mae       f_rmse   stress_mae  stress_rmse        e_mae       e_rmse
! Train             359 2939.468 7.81e-06     0.000128     4.47e-06     3.61e-06     0.000136      0.00832       0.0113      0.00162      0.00212       0.0254       0.0304
! Validation        359 2939.468 7.81e-06        0.122     2.53e-05         4.32         4.44        0.273        0.349      0.00396      0.00503         33.2         33.2

The energy loss of validation dataset is so large. It's strange.

Hi, @wkylee14

Could you please show me your configuration file?
Do you use extxyz file?