turowicz opened this issue 4 years ago · 3 comments
Hey,
I'm having some serious issues each time I stop the training after a checkpoint has been created and run evaluation.
It seems like the restarted job picks up the checkpoint step number but starts learning from scratch.
Cheers
@turowicz, this certainly is odd behaviour.. Can you please specify the commands you used for training and evaluation? A copy of your config file would also be helpfull
More details here:
tensorflow/models#9229 (comment)
@sglvladi that was EfficientNet causing the issue