[SpeechLM] About phoneme tokenizer in detail?
yuseungwoo opened this issue · 1 comments
First of all, Thanks your great works and code
I am studying SpeechLM and found some curious things about training and inference.
-
Can you guide which stage did you use for learning? below #L155 as I expected?
[https://github.com/kaldi-asr/kaldi/blob/master/egs/librispeech/s5/run.sh#L155] -
Can you guide which decoder is used for Pseudo label generation and share you command ?
steps/decode_fmllr.sh or online2-wav-gmm-latgen-faster directly?
Best Regards
Sorry for the late response.
-
Yes, as you expected. We trained two Phoneme tokenizers in our paper, which is a GMM-HMM model using 100-hour data for the Base setting, and a DNN-HMM model using 960-hour data for the Large setting. The GMM-HMM model is exactly 'tri4b' (after stage 13). The DNN-HMM model is exactly the chain model obtained after running the whole script (after the last stage).
-
steps/decode_fmllr.sh for the GMM-HMM model.