No Decoding Output

Question

No Decoding Output

kevinmchu opened this issue 4 years ago · 20 comments

I'm running the TIMIT LSTM on custom features, and I obtained the following error in my log.log file:

I checked my best path file, but did not see any error messages or warnings.

I've also double checked my cfg file, and all of the directories exist. I'm running Ubuntu 16.04, CUDA 10.2, PyTorch 1.7.1. What am I doing wrong?

TParcollet commented 4 years ago

Weird ..

Answer 1 · 2020-12-31T17:28:03.000Z

Hi, as we can see from the log, "Done 0 lattices" hence something went wrong during the forward phase. I would recommend removing all the directories related to the decoding and remove the forward files generated by pytorch Kaldi (the one created when forward the test). Then start again and check that the forward process goes smoothly.

Answer 2 · 2020-12-31T17:45:54.000Z

Thanks for the quick reply. I removed the decoding directories and forward files and reran the model on the test set, but I obtained the same error as before.

Answer 3 · 2020-12-31T19:13:17.000Z

Does the forward phase runs smoothly ? Can you see it ?

Answer 4 · 2020-12-31T19:17:48.000Z

This is the output I obtain when I run the model on the test data:

Reading config file......OK!
Chunk creation......OK!

Testing TIMIT_test chunk = 1 / 1
[========================================] 100% Forwarding | (Batch 192/192))
Decoding TIMIT_test output out_dnn2

Does this indicate that the forward phase ran smoothly?

Answer 5 · 2020-12-31T19:20:15.000Z

Yep. Is the final.mdl model existing ? Can you check his size ? Also, you could try to run manually the Kaldi command line that fails ..

Answer 6 · 2020-12-31T19:26:43.000Z

Yes, final.mdl exists and has a size of 5.2MB.

As for manually re-running latgen-faster-mapped, where can I find the values of the $thread_string, $min_active, $max_active, etc.?

Answer 7 · 2020-12-31T19:29:10.000Z

Also, I was able to run the decoder for an LSTM trained on MFCCs, which makes me think there is something wrong with my features.

Answer 8 · 2021-01-01T18:35:49.000Z

@mravanelli Do you have any insight about this issue?

Answer 9 · 2021-01-01T18:56:37.000Z

It is most likely that the forwarded data are empty. How fast was the forward phase ? If it is super quick, it might indicate that your input features are indeed not good. You definitely should try to call the command manually to get the different output, like checking if the lattices are empty.

Answer 10 · 2021-01-01T19:00:05.000Z

The forward phase lasted ~10 minutes. I ran latgen-faster-mapped without any errors, but the lattices were empty.

Answer 11 · 2021-01-01T19:14:25.000Z

So in TIMIT_test output out_dnn2 all the lat.*.gz are empty ?

Answer 12 · 2021-01-01T19:16:05.000Z

If so please check that your $finalfeats (don't know where you saved them) are ok (not empty).

Answer 13 · 2021-01-01T19:50:14.000Z

I just realized I forgot to change the fea_name in the configuration file. However, when I changed fea_name to the correct name, I obtained this error:

ERROR: the input "mfcc" is not defined before (possible inputs are ['xxxx'])

Answer 14 · 2021-01-01T21:35:48.000Z

I removed all directories relating to the trained model and re-trained over 1 epoch. However, I am not getting a final.mdl file when training finishes. The log.log file does not show any errors or warnings. I did receive this warning on the terminal:

/home/lab/anaconda3/lib/python3.6/site-packages/matplotlib/axes/_base.py:1717: UserWarning: Attempting to set identical left==right results
in singular transformations; automatically expanding.
left=0, right=0
self.set_xlim([v[0], v[1]], emit=emit, auto=False)

Does this explain the missing final.mdl file?

Answer 15 · 2021-01-02T08:59:52.000Z

No, the final.mdl only appears if you reach the number of epochs given in the config file.

Answer 16 · 2021-01-02T15:26:12.000Z

To clarify, in the cfg file I set n_epochs_tr to 1 but still did not get a final.mdl. Is there something else I am supposed to change if I only want tot train over 1 epoch?

Answer 17 · 2021-01-02T18:49:27.000Z

I solved the problem with the missing final.mdl. I split run_exp.py into training and testing scripts, and it turns out I needed to run the testing script for final.mdl to appear.

However, I am experiencing the same problem as before during decoding where the forward phase runs smoothly, but I do not obtain any output. My lat.1.gz file is only 20 bytes. forward_TIMIT_test_ep0_ck0_out_dnn2_to_decode.ark is 2.1GB, which seems reasonable. Any other ideas?

Answer 18 · 2021-01-07T01:12:31.000Z

@TParcollet @mravanelli I just wanted to follow up and ask if you have any more insight about this issue.

Answer 19 · 2021-01-08T18:24:53.000Z

I figured out the problem. The issue was a mismatch between my lab_graph and lab_folder, which by default raises a segmentation fault.