No Models Saved, No Validation Loss Reported
grahamdelafield opened this issue · 3 comments
Hi everyone, I am trying to train a new model from scratch. As a test, my train input is ~8000 spectra from the MassIVEKB, and validation is 2,000 spectra from MassIVEKB. I have generated a .yaml file and edited it to change the number of epochs (n=5 as a test) and I have provided changed the model_save_folder_path
to be simply ".". Whenever I call casanovo train
, whether I use this config file or not, and whether I specify the -o
flag or not, Casanovo will run through the requested number of epochs and "finish" but no models are ever saved.
As well, "Train Loss" is reported on each epoch but "Valid Loss" is always "nan." Not sure if these are related or not.
This issue is found on v4.1.0 and 4.0.1.
Any ideas?
Hi Graham,
I think the issue stems from the val_check_interval
option in the config file which determines both how frequently validation is run during training and, as a result, a model checkpoint is saved. You should try setting a smaller number of steps for that option, mind you it's in steps i.e. iterations and not in epochs, to see checkpoints saved.
This was exactly the issue. Thanks for the explanation; this helps me understand how to effectively utilize the program.
It seems like we should ensure that the final model is always saved, regardless of the val_check_interval.