k2-fsa/icefall

Extend tokens.txt with new tokens on pretrained model

gorosei-dev opened this issue · 1 comments

Suppose I want to further train the pretrained model on more data, but the new data contains some new tokens that are not covered in the tokens.txt / bpe.model, and I want the new model to be able to recognize these new tokens, how can I achieve this without retraining from scratch?

You can reuse all parameters of your pre-trained model except for the output layer part, also remember to modify the lang_dir you are using for the later fine-tuning