mravanelli/pytorch-kaldi

Error with getting shared_list

seas2nada opened this issue · 0 comments

I'm trying to train Librispeech alignments in Ubuntu docker container, but keep getting empty shared_list errors.

Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/Workspace/dh/pytorch-kaldi-dh/data_io.py", line 573, in read_lab_fea
fea_scp, fea_opts, lab_folder, lab_opts, cw_left, cw_right, max_seq_length, output_folder, fea_only
File "/home/Workspace/dh/pytorch-kaldi-dh/data_io.py", line 249, in load_chunk
fea_scp, fea_opts, lab_folder, lab_opts, left, right, max_sequence_length, output_folder, fea_only
File "/home/Workspace/dh/pytorch-kaldi-dh/data_io.py", line 208, in load_dataset
fea_conc, lab_conc, end_index_fea, end_index_lab = _concatenate_features_and_labels(fea_chunks, lab_chunks)
File "/home/Workspace/dh/pytorch-kaldi-dh/data_io.py", line 160, in _concatenate_features_and_labels
fea_conc, lab_conc = _sort_chunks_by_length(fea_conc, lab_conc)
File "/home/Workspace/dh/pytorch-kaldi-dh/data_io.py", line 149, in _sort_chunks_by_length
fea_conc, lab_conc = zip(*fea_sorted)
ValueError: not enough values to unpack (expected 2, got 0)

Traceback (most recent call last):
File "./run_exp_semi.py", line 276, in
next_config_file,
File "/home/Workspace/dh/pytorch-kaldi-dh/core.py", line 529, in run_nn
data_name = shared_list[0]
IndexError: list index out of range

I've run exactly same code in my Ubuntu PC with RTX 2080Ti & 8 CPU cores, and it worked well.
However, when I try to do the same thing in my Ubuntu PC with four 2080Ti & 28 cores, it returns empty fea_conc in data_io.py
At first, I thought multi-threading might occur problems, so I limited number of threads to use but it didn't work. Get rid of multi-threading didn't work either.
What I found out is that it returns "None" key in data_io.read_key. Maybe the command line 'fd.read(1).decode("latin1")' has some problems with my case, but it's quite difficult for me to solve, since I do not understand what fd actually means. When I print it out, it shows me something like <_io.Bufferedreader name=4>, but I don't get what it means.

Can you give me some advice for this kind of problem?
Thanks for your attention