Error in resuming training from where it stopped.
Opened this issue · 1 comments
I tried to resume the training from where it stopped by changing the restore_path variable in the config.py to ./checkpoint/
But it showed some error as:
Traceback (most recent call last): File "train.py", line 91, in <module> train() File "train.py", line 79, in train launch_train_with_config(traincfg, trainer) File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/interface.py", line 99, in launch_train_with_config extra_callbacks=config.extra_callbacks) File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/base.py", line 342, in train_with_defaults steps_per_epoch, starting_epoch, max_epoch) File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/base.py", line 313, in train self.initialize(session_creator, session_init) File "/usr/local/lib/python3.6/dist-packages/tensorpack/utils/argtools.py", line 168, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/tower.py", line 147, in initialize super(TowerTrainer, self).initialize(session_creator, session_init) File "/usr/local/lib/python3.6/dist-packages/tensorpack/utils/argtools.py", line 168, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/base.py", line 226, in initialize session_init._setup_graph() File "/usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/sessinit.py", line 110, in _setup_graph dic = self._get_restore_dict() File "/usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/sessinit.py", line 159, in _get_restore_dict self._match_vars(f) File "/usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/sessinit.py", line 126, in _match_vars reader, chkpt_vars = SaverRestore._read_checkpoint_vars(self.path) File "/usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/sessinit.py", line 120, in _read_checkpoint_vars reader = tf.train.NewCheckpointReader(model_path) File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/pywrap_tensorflow_internal.py", line 873, in NewCheckpointReader return CheckpointReader(compat.as_bytes(filepattern)) File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/pywrap_tensorflow_internal.py", line 885, in _init_ this = _pywrap_tensorflow_internal.new_CheckpointReader(filename) tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ./checkpoint/
I also tried restore_path = ./checkpoint/model-500.ckpt, but get the error that the given checkpoint file doesn't exist.
It would be great if you can help with this.
Thanks!
I am also getting error when I try to continue training:
config file:
restore_path = './checkpoint'
checkpoint_path = '/model-500000'
I am getting this error:
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file .\checkpoint: Unknown: NewRandomAccessFile failed to Create/Open: .\checkpoint2 : Access is denied.