[SpeechLM] About phoneme tokenizer in detail?

Question

yuseungwoo opened this issue 2 years ago · 1 comments

First of all, Thanks your great works and code

I am studying SpeechLM and found some curious things about training and inference.

Can you guide which stage did you use for learning? below #L155 as I expected?
[https://github.com/kaldi-asr/kaldi/blob/master/egs/librispeech/s5/run.sh#L155]
Can you guide which decoder is used for Pseudo label generation and share you command ?
steps/decode_fmllr.sh or online2-wav-gmm-latgen-faster directly?

Best Regards

Answer 1 · 2023-04-06T03:54:40.000Z

Sorry for the late response.

Yes, as you expected. We trained two Phoneme tokenizers in our paper, which is a GMM-HMM model using 100-hour data for the Base setting, and a DNN-HMM model using 960-hour data for the Large setting. The GMM-HMM model is exactly 'tri4b' (after stage 13). The DNN-HMM model is exactly the chain model obtained after running the whole script (after the last stage).
steps/decode_fmllr.sh for the GMM-HMM model.