This repository contains the Kaldi LF-MMI implementation of the paper Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition, IEEE/ACM Transactions on Audio Speech and Language (TASLP).
By Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng
- Install Kaldi
- Clone the repo:
git clone https://github.com/skhu101/Bayesian_TDNN.git
Step 1:
-
Add the BayesTdnnV2Component in nnet-convolutional-component.h to kaldi/src/nnet3/nnet-convolutional-component.h
-
Add the BayesTdnnV2Component in nnet-tdnn-component.cc to kaldi/src/nnet3/nnet-tdnn-component.cc
-
Add the following four lines to the corresponding location in kaldi/src/nnet3/nnet-component-itf.cc
else if (cpi_type == "BayesTdnnV2ComponentPrecomputedIndexes") {
ans = new BayesTdnnV2Component::PrecomputedIndexes();
else if (component_type == "BayesTdnnV2Component") {
ans = new BayesTdnnV2Component();
- complie the new source file
cd kaldi/src/nnet3/
make -j 20
Step 2:
run the factored TDNN model using the following command
cd kaldi/egs/swbd/s5c
bash local/chain/tuning/run_tdnn_7q.sh
Step 3:
This part of code should be run based on the standard TDNN model (run_tdnn_7q.sh)
bash local/chain_kaldi_feats/run_btdnn_7q.sh \
exp/chain_kaldi_feats/btdnn7q_sp_4epoch (directory of the standard TDNN system) \
1200.mdl (TDNN model updated with half of the total iterations)
Model | hub5' 00 swbd |
hub5' 00 callhm |
hub5' 00 avg |
rt03 fisher |
rt03 swbd |
rt03 avg |
---|---|---|---|---|---|---|
tdnn_7q | 9.6 | 18.0 | 13.8 | 12.3 | 20.0 | 16.3 |
bayes_tdnn_7q | 9.4 | 17.3 | 13.4 | 11.7 | 19.3 | 15.7 |
Note that we set --trainer.optimization.num-jobs-initial 1 and --trainer.optimization.num-jobs-final 1 in our experiments due to computational resource constraint.
If you find our codes or trained models useful in your research, please consider to star our repo and cite our paper:
@article{hu2021bayesian,
title={Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition},
author={Hu, Shoukang and Xie, Xurong and Liu, Shansong and Yu, Jianwei and Ye, Zi and Geng, Mengzhe and Liu, Xunying and Meng, Helen},
journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
volume={29},
pages={1514--1529},
year={2021},
publisher={IEEE}
}