melonwan/denseReg

error while training

Closed this issue · 2 comments

hi,
I am now setup the DenseReg environment in local server, but encounter issue when executing the training command below,

python model/hourglass_um_crop_tiny.py --dataset 'nyu' --batch_size 3 --num_stack 2 --num_fea 128 --debug_level 2 --is_train True

It seems the code doesn't find some file, but finally it enters the training log, can you help me to clarify what cause the issue? from my analysis, it seems
there is error happening in the below function .

135 #TODO: change to tf.train.SummaryWriter()
136 summary_writer = tf.summary.FileWriter(
137 model.summary_dir,
138 graph=sess.graph)

ERROR:tensorflow:Exception in QueueRunner: ./exp/data/nyu/tf_train/training-47-of-300; No such file or directory
[[Node: batch_processing/ReaderReadV2 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/device:CPU:0"](batch_processing/TFRecordReaderV2, batch_processing/input_producer)]]
Exception in thread QueueRunnerThread-batch_processing/random_shuffle_queue-batch_processing/random_shuffle_queue_enqueue:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/queue_runner_impl.py", line 252, in _run
enqueue_callable()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1205, in _single_operation_run
self._call_tf_sessionrun(None, {}, [], target_list, None)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun
run_metadata)
NotFoundError: ./exp/data/nyu/tf_train/training-47-of-300; No such file or directory
[[Node: batch_processing/ReaderReadV2 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/device:CPU:0"](batch_processing/TFRecordReaderV2, batch_processing/input_producer)]]

ERROR:tensorflow:Exception in QueueRunner: ./exp/data/nyu/tf_train/training-4-of-300; No such file or directory
[[Node: batch_processing/ReaderReadV2_1 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/device:CPU:0"](batch_processing/TFRecordReaderV2_1, batch_processing/input_producer)]]
Exception in thread QueueRunnerThread-batch_processing/random_shuffle_queue-batch_processing/random_shuffle_queue_enqueue_1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/queue_runner_impl.py", line 252, in _run
enqueue_callable()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1205, in _single_operation_run
self._call_tf_sessionrun(None, {}, [], target_list, None)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun
run_metadata)
NotFoundError: ./exp/data/nyu/tf_train/training-4-of-300; No such file or directory
[[Node: batch_processing/ReaderReadV2_1 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/device:CPU:0"](batch_processing/TFRecordReaderV2_1, batch_processing/input_producer)]]

finally into the long long training loop

i try to enable below two codes about tfrecord about training, there is no error above, but seems another errors happened.
image

There is no error happening above while using GPU instead of CPU.