Problems of Generating tr_label_phn during Inference

Question

Problems of Generating tr_label_phn during Inference

LyWangPX opened this issue a year ago · 4 comments

In my own inference experiment, I notice the score is mainly determined not by the .wav but by the phn.
There was an extreme pattern for multiple sound files of the same word:

Word A 4 4 4 4 4 4
Word B 5 5 5 5 5 5

Even after messing up the .wav files, the results remain the same.
Then I found a potential reason:

In gen_seq_data_phn.py, tr_label_phn or te_label_phn is generated by the phn_dict that is specific to the dataset that we want to use. However, the pretrain model is based on SpeechOcean762. When trying to inference any other dataset, the model will receive these labels specific to the inference dataset not the SpeechOcean dataset, causing inconsistent inference results.

The correct method is to always generate the phn_dict from the one generated when training the SpeechOcean762.
I will update the inference tutorial if you think it is necessary.

Answer 1 · 2023-02-10T20:24:34.000Z

Hi, I think you are very correct on this (i.e., The correct method is to always generate the phn_dict from the one generated when training the SpeechOcean762). Otherwise the model won't do anything correct.

I notice the score is mainly determined not by the .wav but by the phn.

However, even if the bug is fixed, the input phn would have an relative large impact on the prediction. This is because 1) different phn have different error prior; and 2) if the phone is pronounced correctly depends on the canonical phone, e.g., for a phone pronounced as /e/, it will be correct if the canonical phone is /e/, but wrong if the canonical phone is /a:/. We did an ablation study in the paper.

-Yuan

Answer 2 · 2023-03-25T06:30:46.000Z

Hi @YuanGongND did you update the tutorial?

Answer 3 · 2023-03-25T07:38:44.000Z

@amandeepbaberwal

No, I don't plan to do so as 1) it is not promised in the paper, we already released whatever we have; and 2) it is more related to Kaldi rather than GOPT.

Please understand that we are not a company so cannot provide full support for the project.

-Yuan

Answer 4 · 2023-03-31T06:10:17.000Z

Hi @LyWangPX could you please explain how did you solve this problem?? I am running into the same problem my score is not changing even i change the content in to .wav file completely.