/Bayesian_TDNN

This repository contains the Kaldi LF-MMI implementation of the paper "Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition", IEEE/ACM Transactions on Audio Speech and Language (TASLP).

Primary LanguageC++

Bayesian_TDNN

This repository contains the Kaldi LF-MMI implementation of the paper Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition, IEEE/ACM Transactions on Audio Speech and Language (TASLP).

By Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng

Paper

Getting Started

  • Install Kaldi
  • Clone the repo:
    git clone https://github.com/skhu101/Bayesian_TDNN.git
    

Usage

Step 1:

  • Add the BayesTdnnV2Component in nnet-convolutional-component.h to kaldi/src/nnet3/nnet-convolutional-component.h

  • Add the BayesTdnnV2Component in nnet-tdnn-component.cc to kaldi/src/nnet3/nnet-tdnn-component.cc

  • Add the following four lines to the corresponding location in kaldi/src/nnet3/nnet-component-itf.cc

else if (cpi_type == "BayesTdnnV2ComponentPrecomputedIndexes") {
    ans = new BayesTdnnV2Component::PrecomputedIndexes();

else if (component_type == "BayesTdnnV2Component") {
    ans = new BayesTdnnV2Component();
  • complie the new source file
cd kaldi/src/nnet3/
make -j 20

Step 2:

run the factored TDNN model using the following command

cd kaldi/egs/swbd/s5c
bash local/chain/tuning/run_tdnn_7q.sh

Step 3:

This part of code should be run based on the standard TDNN model (run_tdnn_7q.sh)

bash local/chain_kaldi_feats/run_btdnn_7q.sh \
exp/chain_kaldi_feats/btdnn7q_sp_4epoch (directory of the standard TDNN system) \
1200.mdl (TDNN model updated with half of the total iterations)

Result comparison:

Model hub5' 00
swbd
hub5' 00
callhm
hub5' 00
avg
rt03
fisher
rt03
swbd
rt03
avg
tdnn_7q 9.6 18.0 13.8 12.3 20.0 16.3
bayes_tdnn_7q 9.4 17.3 13.4 11.7 19.3 15.7

Note that we set --trainer.optimization.num-jobs-initial 1 and --trainer.optimization.num-jobs-final 1 in our experiments due to computational resource constraint.

Citation

If you find our codes or trained models useful in your research, please consider to star our repo and cite our paper:

@article{hu2021bayesian,
  title={Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition},
  author={Hu, Shoukang and Xie, Xurong and Liu, Shansong and Yu, Jianwei and Ye, Zi and Geng, Mengzhe and Liu, Xunying and Meng, Helen},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  volume={29},
  pages={1514--1529},
  year={2021},
  publisher={IEEE}
}