Chinese-ASR built on kaldi
Opencc: convert simplified Chinese to traditional Chinese
https://github.com/yichen0831/opencc-python
jeiba zh version : Traditional Chinese word segmentation tool:
1.modify kaldi path in path.sh
2.modify corpus path in local/data/corpus_path.sh
3.Install sequitar(G2P), sox, kaldi_lm in kaldi/tools/
4.bash run.sh
-
LM training and interpolation : local/lm
-
Customed WFST for multiple choice problem : local/lm/wfst
-> Force the outputs to the format of " XXX XXX XXX XXX"
- scripts of training DFSMN : local/nnet
Model | TOCFL(CER%) | Cyberon_Chinese_test(CER%) |
---|---|---|
mono0a | 97.76 | 100.71 |
tri1 | 50.55 | 63.64 |
tri2 | 56.62 | 46.65 |
tri3 | 34.78 | 46.78 |
tri4 | 37.02 | 34.02 |
tri5 | 65.60 | 49.96 |
tdnn_lstm1 | 18.30 | 24.82 |
tdnn_lstm(realign) | 15.88 | 22.24 |
DFSMN(Alibaba) | 11.22 | 12.14 |