resuming training with ranger21?
neuronflow opened this issue · 3 comments
neuronflow commented
As I learned ranger21 does internal lr scheduling etc.
How should training be resumed? Is there a state dict to be loaded etc.?
lessw2020 commented
Hi @neuronflow,
Thanks for opening the issue!
Ranger21 does maintain a basic state dict but for sure we need to update it with some additional data to ensure a clean restart if training is stopped.
Let me use this issue to track it and I'll test and fix it ideally in the next few days as this has been on my todo list.
neuronflow commented
any updates on this one? :) I lost multiple GPU days of training because the trainings are non resumable :/
Elevory commented
Seconding the need for this feature!