Model training for StressForceOutput
Opened this issue · 4 comments
Hi,
Thanks for making the Allegro repo. be public. I was just wondering if you have any guidance or though on preparing a config file when training the Allegro that predicts stress tensor outputs as well in addition to forces and total potential energies (StressForceOutput).
I've tried several attempts for my dataset with the following configurations, but losses for stress tensor decreases very slowly and marginally while losses for forces or energies keep decreasing after a certain number of training epochs.
Attempt 1: Applying PerAtomMSELoss
loss_coeffs:
forces: 1.
total_energy:
- 1.
- PerAtomMSELoss
stress:
- 1.
- PerAtomMSELoss
Attempt 2: Assigning more weights to loss for stress tensor predictions
loss_coeffs:
forces: 1.
total_energy:
- 1.
- PerAtomMSELoss
stress: 100.
Attempt 3: Assigning simple MSE loss function for stress tensor
loss_coeffs:
forces: 1.
total_energy:
- 1.
- PerAtomMSELoss
stress: 1.
Otherwise, do you recommend not to add loss for stress tensor?
Any recommendation or guidance when I use Allegro to predict stress tensors, forces, and potential energies would be welcome!
Kind regards,
Hi @wkylee14 ,
Thanks for your interest in our code!
stress
should use a normal, and not PerAtom
loss. Stress training issues are often linked to incorrect labels, either due to DFT issues, unit conversion issues, or an incorrect sign convention. (We follow the convention stress = (-1 / volume) * virial as discussed in various other threads on the nequip repo: https://github.com/mir-group/nequip/blob/main/nequip/nn/_grad_output.py#L346-L349).
Hi, @Linux-cpp-lisp
I have extxyz file like this:
Lattice="7.749908999 0.0 0.0 3.874954499 6.71161807 0.0 3.874954499 2.237206023 6.3277742" Properties=species:S:1:pos:R:3:forces:R:3 energy=-85.53947668 stress="0.013635762572225896 -0.001867459530464271 -0.00034451257922137816 -0.001867459530464271 0.00040893119490961933 0.003952978665259525 -0.00034451257922137816 0.003952978665259525 -0.0004071773308452461" free_energy=-85.53947668 pbc="T T T"
And I set loss function as follows:
loss_coeffs:
forces: 1.
stress: 1.
total_energy:
- 1.
- PerAtomMSELoss
Is this correct?
I find that there is virial instead of stress in https://github.com/mir-group/nequip/blob/main/configs/minimal_stress.yaml#L56C1-L58C10.
Since stress = (-1 / volume) * virial, is there a trick for unit conversion?
After I set loss function as above, the loss is as follows:
Train # Epoch wal LR loss_f loss_stress loss_e loss f_mae f_rmse stress_mae stress_rmse e_mae e_rmse
! Train 359 2939.468 7.81e-06 0.000128 4.47e-06 3.61e-06 0.000136 0.00832 0.0113 0.00162 0.00212 0.0254 0.0304
! Validation 359 2939.468 7.81e-06 0.122 2.53e-05 4.32 4.44 0.273 0.349 0.00396 0.00503 33.2 33.2
The energy loss of validation dataset is so large. It's strange.