data feeder error
fazlekarim opened this issue · 6 comments
after running the code for about 200 steps, I am running into the following error. I can't figure out why. I feel like it should be an easy fix.
self._session.run(self._enqueue_op, feed_dict=feed_dict)
File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.CancelledError: Enqueue operation was cancelled
[[Node: datafeeder/input_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](datafeeder/input_queue, _arg_datafeeder/inputs_0_1, _arg_datafeeder/input_lengths_0_0, _arg_datafeeder/mel_targets_0_3, _arg_datafeeder/linear_targets_0_2)]]
Caused by op 'datafeeder/input_queue_enqueue', defined at:
File "train.py", line 153, in
main()
File "train.py", line 149, in main
train(log_dir, args)
File "train.py", line 58, in train
feeder = DataFeeder(coord, input_path, hparams)
File "/home/fakarim/projects/gst-tacotron/datasets/datafeeder.py", line 46, in init
self._enqueue_op = queue.enqueue(self._placeholders)
File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/ops/data_flow_ops.py", line 327, in enqueue
self._queue_ref, vals, name=scope)
File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 2777, in _queue_enqueue_v2
timeout_ms=timeout_ms, name=name)
File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
CancelledError (see above for traceback): Enqueue operation was cancelled
[[Node: datafeeder/input_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](datafeeder/input_queue, _arg_datafeeder/inputs_0_1, _arg_datafeeder/input_lengths_0_0, _arg_datafeeder/mel_targets_0_3, _arg_datafeeder/linear_targets_0_2)]]
fixed it. code is fine. i just over reacted
@fazlekarim
I am having a similar issue. How did you solve this?
@fazlekarim @lapwing It is an OOM error. Since there are some too long sentences, it may throw OOM error at some step. You can fix it by:
- Recude batch_size or increase the reduce_factor. (Changing reduce factor will affect the performance.)
- Remove those too long sentences. For example, remove all sentences which are longer than 1200 frames. This will decrease the data size a little, but I guess it will not attect the performance too much.
Do you have a script to remove sentences greater than 1200 frames?
@fazlekarim A simple way is to modify the data process script as attached.
blizzard2013.zip