Get nan values during train
Closed this issue · 4 comments
Dear,
I try to train a simple model. and get all values in loss
The logfile as attachment.
Can you have a little guide.
Thank you so much
It seems like your data has no stress label or the label is strange (see 'Stress distribution' of log).
Have you tried with is_train_stress
as False? The key is under train:
hi @YutackPark
I set it False
Then train now interrupt without any error, at log
Trainer initialized, ready to training
------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------
Epoch 1/10 lr: 0.001000
------------------------------------------------------------------------------------------------------------------------
Do you know why?
This is my input:
input.txt
Firstly, you should uncomment "# - ['TotalLoss', 'None']". SevenNet needs total loss to determine the best checkpoint to save.
However, SevenNet should raise an error and quit if this is the case.
I failed to reproduce the issue with the same input but a different training set. Maybe, it is just that training is very slow. Could you share your dataset if you don't mind?
hi @YutackPark
The dataset at this link
With PR#89, you can set input as
data_format: 'ase'
data_format_args:
energy_key: 'TotEnergy'
force_key: 'force'
Then you can repoduce the problem.
I confirn that, above problem occur on Windows, when I test on Linux the problem disappear, and code can run well