k2-fsa/icefall

How to use an external RNN-LM (mono-lingual) with a bilingual ASR?

sangeet2020 opened this issue · 3 comments

Hi K2 team,

Thank you so much for your amazingly efficient toolkit in streaming focused ASR.

I have trained an EN-DE bilingual streaming ASR model using this receipe.
However, I am not really satisfied with the performance on the English side, and I want to use an externally trained RNN LM (trained using this receipe) to strengthen the WER only on the English side.

I tried using --decoding-method modified_beam_search_lm_shallow_fusion and using English RNN-LM, however, ran into errors due to different vocab size used.
vocab size for bilingual ASR training = 1000 (500 for EN and 500 for DE) and vocab size used for English RNN-LM = 500.

I wonder if its possible to use a monolingual RNN LM with a bilingual ASR model.

Alternatively, is it possible to combine two RNN-LMs? or somehow interpolate them?
I saw some related discussions here: kaldi-asr/kaldi#2069.

Thank You

I think it's possible as long as the German bpe and English bpe are distinguishable.

And you also need to make sure which language you are decoding, otherwise you might end up rescoring the German utterance with English RNNLM.

but wouldnt different vocab size of the BPE model for ASR and RNN-LM create an issue in the first place.

When the loading the RNN LM

            model = RnnLmModel(
                vocab_size=params.vocab_size,
                embedding_dim=params.rnn_lm_embedding_dim,
                hidden_dim=params.rnn_lm_hidden_dim,
                num_layers=params.rnn_lm_num_layers,
                tie_weights=params.rnn_lm_tie_weights,
            )

params.vocab_size is the size of the sentence piece tokenizer from ASR (1000 in my case), which is different from the actual RNN LM vocab size (500 in my case). How can I overcome this?

You need to change the code, I only mean that it's theoretically possible to use a mono-lingual RNNLM to rescore multi-lingual ASR model.