Recover parameters for resume training
Opened this issue · 1 comments
AdamYang011 commented
Hello! This is excellent work! I'm currently attempting to conduct an experiment based on this project. However, I have some questions about how to resume the checkpoint for continuous training.
I've already tried using the parser --resume, but it seems that the parameter recovery isn't working properly. Can I rely on the --resume parser, or do I need to recover from the bitstream using --bitstream?
hmkx commented
Hi, to resume training from a model checkpoint, simply use the following argument:
--resume OUTPUT_DIR/checkpoints/CHECKPOINT_NAME
OUTPUT_DIR is the full path of the original run, which will be printed at the beginning.
CHECKPOINT_NAME is one of the checkpoint names in the checkpoints folder.
For example, if the original run printed:
Output dir: /home/HiNeRV/ReadySetGo-HiNeRV-20240413-191040-15cda5b7
You can resume training with:
--resume /home/HiNeRV/ReadySetGo-HiNeRV-20240413-191040-15cda5b7/checkpoints/checkpoint_best