loss for lf_mmi is high.

Hi, sorry for interrupting.
I was runing the demo egs for aishell and noticing an abnormal phenomenon.
My setting is the same as aed.sh. While checking the loss thrend, I notice that the loss_ctc(ctc_type==k2_mmi) is going up after 3 epoch.

Am I doing something wrong?

We have similar results and here is the explanation.

During the training stage, the MMI loss is calculated in word-level. E.g., a mandarin character sequence ABC may be considered as a word sequence A BC if the word BC is in the lexicon. We find training MMI loss in word-level, rather than the naive character-level, may provide slight CER improvement since the polyphonic problem of Mandarin is alleviated. However, we find the validation MMI loss would spike if we do this (maybe because of the over-fitting problem).

Note that, during decoding, the MMI can only work in character-level since the word segmentation information is missed. So it may be reasonable to compute the validation MMI loss in character-level. To achieve this, try to modify code like below and you would observe a validation MMI loss around 10.

change

e2e_lfmmi/snowfall/warpper/warpper_mmi.py

Lines 134 to 136 in a359d4b

    
           if self.training: 
        
               assert self.P.is_cpu 
        
               assert self.P.requires_grad is True

as:

if self.training:
    assert self.P.is_cpu
    assert self.P.requires_grad is True
else:
    # Never use segmentation in evaluation: to approximate the decoding stage
    ys = [[self.char_list[c] for c in y if c != self.pad_id] for y in ys_pad]
    ys = [" ".join(y).replace("<eos>", "") for y in ys]

You may also delete file data/dev/text_org to exclude all word segmentation info in the validation set and re-generate your data.json.
You may also ignore this problem. It would not have an impact on decoding results.

You are very welcome to report any bug or concern to help us improve. We are writing a journal paper on this work and would release a revised version later. So far, some problems, including this one, are solved but not updated in github due to the company policy. We are sorry for that.

regards

	if self.training:
	assert self.P.is_cpu
	assert self.P.requires_grad is True