Error message when training

Question

Error message when training

Closed this issue 7 years ago · 2 comments

Hello!

I have trained on the same data from windows and from unix, the data contain 1293 wav files.
I did not create an FST or a language model. A checkpoint directory was created and I used it for
inference.

Surprisingly, when running inference on one of the wav files in the training data - the result was blank.
Could it be related to the lack of language model?

Here is the unix training run, the windows training run gave the same messages (but different files).

Thanks,
     Yuval

shell:~/speech/deepsphinx$ python3 bin/deepsphinx-train --job-dir data --trans-file data/ds-input-unix.txt --nouse-train-lm --batch-size 1293

INFO:tensorflow:Getting speaker stats
INFO:tensorflow:Starting training
INFO:tensorflow:Epoch completed, saving
INFO:tensorflow:Evaluation started
INFO:tensorflow:Restoring parameters from data/checkpoints/batch-0
Traceback (most recent call last):
File "bin/deepsphinx-train", line 248, in
tf.app.run(train)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "bin/deepsphinx-train", line 154, in train
lm_fst)
File "bin/deepsphinx-train", line 80, in run_eval
tot_wer / tot_ev, tot_cer / tot_ev))
ZeroDivisionError: float division by zero

Answer 1 · 2017-09-04T17:15:22.000Z

You should not use that large batch size. Mostly it's a small number around 32.

Answer 2 · 2017-09-06T17:33:42.000Z

There was another problem due to which you got an error. The code expected that there were at least some example was in the validation set, and that's why you got divide by zero error. I have removed that assumption. Please try again. Remember you have to use either train or eval in the set_id column.

Thanks so much for reporting it and bearing with all the bugs. Please re-open if you still get an error.