QData/spacetimeformer

How should I set the epoch?

GA12WAINM opened this issue · 2 comments

Hello dear author, I cloned your project code and tested it, and found that the epochs cannot be set in the command, and the exact location of the epochs cannot be found in the file. I read the args-related part of the instruction set in the file(E.g:--batch_size --workers) and can't seem to find it. How did the author set the epochs? If possible, could you point out how to modify the epochs? Thank you very much if you would like to help.

Hi, we don't set an epoch number and instead use Early Stopping and a reduced learning rate to end the training process. The learning rate decreases when validation loss does not improve. If that doesn't seem to be helping then training process ends. This is a pretty fair way to set training schedules when comparing models with significantly different parameter counts and training times.

If you wanted to get around this you could edit the Trainer object at the bottom of train.py to include the max_epochs flag.
https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html
and then remove the EarlyStopping callback in create_callbacks

(I got email notifications for a few more comments on this issue but they're mysteriously gone. I've never seen that before.. not sure if that's a GitHub bug or if they were all deleted.)

Thank you very much for your reply in your busy schedule! According to your opinion, I adjusted the relevant pl.trainer section in the file [train.py], added the settings of the epoch section and disabled Early_stopping, and the test result is feasible.