Could you share your pretrained ngram LM?
davyuan opened this issue · 2 comments
Hello,
I'm following the link below to train my 6-gram LM for decoding. I use the downloaded the LibriSpeech corpus and use NeMo's CTC conformer medium model to train it. However I'm not seeing any improvement in WER compared to greedy search. The results actually became worse.
https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/asr_language_modeling.html
If you could share your detailed steps for training the 6-gram LM, or your pretrained model, it would be most helpful!
David
Hi David,
I added the missing 6-gram to the shared folders.
https://drive.google.com/drive/folders/1ZhevurjySBT_WMD6Q79g86XUJnTQ8VPa
It was also trained using the NeMo's ngram script by encoding the LibriSpeech corpus with special characters to support byte pair encoding.
You should be able to improve the model score by tuning the alpha and beta hyper-parameters!
Default params used in the paper are in the configs.
Best,
Maxime
thanks Maxime!