This repository contains the Kaldi LF-MMI implementation of the paper Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks, IEEE/ACM Transactions on Audio Speech and Language (TASLP).
By Shoukang Hu, Xurong Xie, Mingyu Cui*, Jiajun Deng*, Shansong Liu, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng
- Install Kaldi
- Clone the repo:
git clone https://github.com/skhu101/TDNN-F_NAS.git
Step 1:
-
Copy the TdnnDARTSV3Component in nnet-convolutional-component.h to kaldi/src/nnet3/nnet-convolutional-component.h
-
Copy the TdnnDARTSV3Component in nnet-tdnn-component.cc to kaldi/src/nnet3/nnet-tdnn-component.cc
-
Copy the OnehotFunctionComponent in nnet-simple-component.h to kaldi/src/nnet3/nnet-simple-component.h
-
Copy the OnehotFunctionComponent in nnet-simple-component.cc to kaldi/src/nnet3/nnet-simple-component.cc
-
Copy the CopyNComponent in nnet-simple-component.h to kaldi/src/nnet3/nnet-simple-component.h
-
Copy the CopyNComponent in nnet-simple-component.cc to kaldi/src/nnet3/nnet-simple-component.cc
-
Copy the GumbelSoftmaxFlopsComponent in nnet-simple-component.h to kaldi/src/nnet3/nnet-simple-component.h
-
Copy the GumbelSoftmaxFlopsComponent in nnet-simple-component.cc to kaldi/src/nnet3/nnet-simple-component.cc
-
Copy the SoftmaxFlopsComponent in nnet-simple-component.h to kaldi/src/nnet3/nnet-simple-component.h
-
Copy the SoftmaxFlopsComponent in nnet-simple-component.cc to kaldi/src/nnet3/nnet-simple-component.cc
-
Copy the BatchNormTestComponent in nnet-normalize-component.h to kaldi/src/nnet3/nnet-normalize-component.h
-
Copy the BatchNormTestComponent in nnet-normalize-component.cc to kaldi/src/nnet3/nnet-normalize-component.cc
-
Copy the following lines to the corresponding location in kaldi/src/nnet3/nnet-component-itf.cc
} else if (cpi_type == "TdnnDARTSV3ComponentPrecomputedIndexes") {
ans = new TdnnDARTSV3Component::PrecomputedIndexes();
} else if (component_type == "TdnnDARTSV3Component") {
ans = new TdnnDARTSV3Component();
} else if (component_type == "OnehotFunctionComponent") {
ans = new OnehotFunctionComponent();
} else if (component_type == "CopyNComponent") {
ans = new CopyNComponent();
} else if (component_type == "SoftmaxFlopsComponent") {
ans = new SoftmaxFlopsComponent();
} else if (component_type == "GumbelSoftmaxFlopsComponent") {
ans = new GumbelSoftmaxFlopsComponent();
} else if (component_type == "BatchNormTestComponent") {
ans = new BatchNormTestComponent();
- Copy the following lines to the corresponding location in kaldi/src/nnet3/nnet-tdnn-component.cc
#include <iostream>
#include <stdio.h>
using namespace std;
- Copy the following lines to the corresponding location in kaldi/src/nnet3/nnet-utils.cc
} else if (directive == "set-temperature-proportion") {
std::string name_pattern = "*";
// name_pattern defaults to '*' if none is given. This pattern
// matches names of components, not nodes.
config_line.GetValue("name", &name_pattern);
BaseFloat proportion = -1.0;
if (!config_line.GetValue("proportion", &proportion)) {
KALDI_ERR << "In edits-config, expected proportion to be set in line: "
<< config_line.WholeLine();
}
int32 num_temp_proportions_set = 0;
for (int32 c = 0; c < nnet->NumComponents(); c++) {
if (NameMatchesPattern(nnet->GetComponentName(c).c_str(),
name_pattern.c_str())) {
TdnnDARTSV3Component *tdnndartsv3component =
dynamic_cast<TdnnDARTSV3Component*>(nnet->GetComponent(c));
GumbelSoftmaxFlopsComponent *gumbelsoftmaxflopscomponent =
dynamic_cast<GumbelSoftmaxFlopsComponent*>(nnet->GetComponent(c));
if (tdnndartsv3component != NULL) {
tdnndartsv3component->SetTempProportion(proportion);
num_temp_proportions_set++;
} else if (gumbelsoftmaxflopscomponent != NULL) {
gumbelsoftmaxflopscomponent->SetTempProportion(proportion);
num_temp_proportions_set++;
}
}
}
KALDI_LOG << "Set temp proportions for "
<< num_temp_proportions_set << " components.";
- If you want to add the specific code, you can use the following command:
cd src/nnet3/
grep -r "TdnnDARTSV3Component" .
grep -r "OnehotFunctionComponent" .
grep -r "CopyNComponent" .
grep -r "SoftmaxFlopsComponent" .
grep -r "GumbelSoftmaxFlopsComponent" .
- complie the new source file
cd kaldi/src/nnet3/
make -j 20
Step 2:
- copy the files in steps to kaldi/egs/swbd/s5c/steps; copy the files in local/chain_NAS to kaldi/egs/swbd/s5c/local/chain_NAS. If you find some files missing, please refer to steps and utils.
Step 3:
- run the factored TDNN model using the following command
cd kaldi/egs/swbd/s5c
bash run.sh
bash local/chain/tuning/run_tdnn_7q.sh
Step 4:
- split the training data into a ration of 95:5 by using the command in src/nnet3/Prepare_NAS_data.sh
bash Prepare_NAS_data.sh
Step 5:
bash local/chain_NAS/run_tdnn_7q_fbk_40_manual.sh --offset 6 --bottleneckdim 160
Step 6:
- 95% pretrain
bash local/chain_NAS/run_TDNN_DARTSV3_fbk_stride_pretrain.sh 7
- 5% cv update
bash local/chain_NAS/run_TDNN_DARTSV3_fbk_stride_cvupdate.sh offset-len parent-path use-gumbel
For example:
# gumbel 5% cv update
bash local/chain_NAS/run_TDNN_DARTSV3_fbk_stride_cvupdate.sh --offset-len 7 --parent-path exp/chain_NAS/tdnn_DARTSV3_context_offset7_95peronehotpretrain_fbk_40_iv_7q_sp --use-gumbel true
# softmax 5% cv update
bash local/chain_NAS/run_TDNN_DARTSV3_fbk_stride_cvupdate.sh --offset-len 7 --parent-path exp/chain_NAS/tdnn_DARTSV3_context_offset7_95peronehotpretrain_fbk_40_iv_7q_sp --use-gumbel false
- train the top1 model in the context offset (6) search
bash local/chain_NAS/run_TDNN_DARTS_Child_mod_fbk.sh parent_path top top_id offset-len gpu_id
For example:
bash local/chain_NAS/run_TDNN_DARTS_Child_mod_fbk.sh exp/chain_NAS/tdnn_DARTSV3_offset7_fbk_40_iv_7q_sp_95onehotpretrain_cvupdate_gumbel top 1 7 1
Step 7:
- 95% pretrain
bash local/chain_NAS/run_TDNNf_DARTS_mod_fbk_bottleneckCBshare_95onehottrain.sh offset-type gpu-id
For example:
bash local/chain_NAS/run_TDNNf_DARTS_mod_fbk_bottleneckCBshare_95onehottrain.sh 4 0
- 5% cv update
bash local/chain_NAS/run_TDNNf_DARTS_mod_fbk_bottleneckCBshare_cvupdate_flopsconstraint.sh parent-path use-gumbel flops-coef
For example:
# pipelinegumbel 5% cv update
bash local/chain_NAS/run_TDNNf_DARTS_mod_fbk_bottleneckCBshare_cvupdate_flopsconstraint.sh --parent-path exp/chain_NAS/tdnn_DARTS_bottleneckCBshare_95onehottrain_25_50_80_100_120_160_200_240_fbk_40_iv_7q_sp --use-gumbel true --flops-coef 1e-3
# pipelinesoftmax 5% cv update
bash local/chain_NAS/run_TDNNf_DARTS_mod_fbk_bottleneckCBshare_cvupdate_flopsconstraint.sh --parent-path exp/chain_NAS/tdnn_DARTS_bottleneckCBshare_95onehottrain_25_50_80_100_120_160_200_240_fbk_40_iv_7q_sp --use-gumbel false --flops-coef 0
- train the top1 model in bottleneck dim search (25,50,80,100,120,160,200,240)
bash local/chain_NAS/run_TDNN_DARTS_bottleneckdim_Child_mod_fbk.sh parent_path fops_coef child_type top_id gpu_id
For example:
bash local/chain_NAS/run_TDNN_DARTS_bottleneckdim_Child_mod_fbk.sh exp/chain_NAS/tdnn_DARTS_bottleneckCBshare_95onehotpretrain_cvupdate_gumbel_flopsconstraint_1e-3 1e-3 top 1 1
- calculate top model parameter size
nnet3-am-copy --binary=false model_dir/final.mdl model_dir/final_txt.mdl
python local/chain_NAS/scripts/bottleneckdim_search_top_model_size.py model_dir top network-type
For example:
nnet3-am-copy --binary=false exp/chain_NAS/tdnn_DARTS_bottleneckCBshare_95onehotpretrain_cvupdate_gumbel_flopsconstraint_1e-3/final.mdl exp/chain_NAS/tdnn_DARTS_bottleneckCBshare_95onehotpretrain_cvupdate_gumbel_flopsconstraint_1e-3/final_txt.mdl
python local/chain_NAS/scripts/bottleneckdim_search_top_model_size.py exp/chain_NAS/tdnn_DARTS_bottleneckCBshare_95onehotpretrain_cvupdate_gumbel_flopsconstraint_1e-3 top 'tdnn'
Step 8:
- 95% pretrain
bash local/chain_NAS/run_TDNNf_DARTS_mod_fbk_optimal_context_offset_bottleneckCBshare_95onehottrain.sh offset-type gpu-id offset0 offset1 ... offset13
For example:
# pipeline gumbel 95% pretrain
bash local/chain_NAS/run_TDNNf_DARTS_mod_fbk_optimal_context_offset_bottleneckCBshare_95onehottrain.sh pipegumbel_context_offset6_top1 1 -2 2 -2 4 -5 5 -6 6 -6 5 -6 6 -6 6 -6 6 -6 6 -6 6 -6 6 -6 6 -6 6 -6 6
# piepline softmax 95% pretrain
bash local/chain_NAS/run_TDNNf_DARTS_mod_fbk_optimal_context_offset_bottleneckCBshare_95onehottrain.sh pipesoftmax_context_offset6_top1 0 -1 2 -2 2 -2 5 -3 6 -4 5 -5 6 -6 6 -6 6 -6 5 -6 6 -6 6 -6 6 -6 6 -6 6
- 5% cv update
For example:
# pipeline gumbel 5% cv update
bash local/chain_NAS/run_TDNNf_DARTS_mod_fbk_optimal_context_offset_bottleneckCBshare_cvupdate_flopsconstraint.sh --offset-type pipegumbel_context_offset6_top1 --parent-path exp/chain_NAS/tdnn_DARTS_pipegumbel_context_offset6_top1_bottleneckCBshare_95onehottrain_25_50_80_100_120_160_200_240_fbk_40_iv_7q_sp --use-gumbel true --flops-coef 1e-1
# piepline softmax 5% cv update
bash local/chain_NAS/run_TDNNf_DARTS_mod_fbk_optimal_context_offset_bottleneckCBshare_cvupdate_flopsconstraint.sh --offset-type pipesoftmax_context_offset6_top1 --parent-path exp/chain_NAS/tdnn_DARTS_pipesoftmax_context_offset6_top1_bottleneckCBshare_95onehottrain_25_50_80_100_120_160_200_240_fbk_40_iv_7q_sp --use-gumbel false --flops-coef 1e-1
- train the top1 model in bottleneck dim search (25,50,80,100,120,160,200,240) based on the optimal context offset
searched model from pipeline gumbel 5% cv update
bash local/chain_NAS/run_TDNN_DARTS_optimal_context_offset_bottleneckdim_Child_mod_fbk.sh parent_path fops_coef child_type top_id gpu_id offset0 ... offset27 offset_type egs_dir
For exmaple:
bash local/chain_NAS/run_TDNN_DARTS_optimal_context_offset_bottleneckdim_Child_mod_fbk.sh exp/chain_NAS/tdnn_DARTS_pipegumbel_context_offset6_top1_bottleneckCBshare_95onehotpretrain_cvupdate_gumbel_flopsconstraint_1e-1 1e-1 top 1 0 -2 2 -2 4 -5 5 -6 6 -6 5 -6 6 -6 6 -6 6 -6 6 -6 6 -6 6 -6 6 -6 6 -6 6 pipegumbel_context_offset6_top1 exp/chain_NAS/tdnn_DARTS_context_offset7_95onehotpretrain_cvupdate_gumbel_Child_Top1_fbk_40_iv_7q_sp/egs
bash local/chain_NAS/run_TDNN_DARTS_optimal_context_offset_bottleneckdim_Child_mod_fbk.sh exp/chain_NAS/tdnn_DARTS_pipesoftmax_context_offset6_top1_bottleneckCBshare_95onehotpretrain_cvupdate_softmax_flopsconstraint_1e-1 1e-1 top 1 0 -1 2 -2 2 -2 5 -3 6 -4 5 -5 6 -6 6 -6 6 -6 5 -6 6 -6 6 -6 6 -6 6 -6 6 pipesoftmax_context_offset6_top1 exp/chain_NAS/tdnn_DARTS_context_offset7_95onehotpretrain_cvupdate_softmax_Child_Top1_fbk_40_iv_7q_sp/egs
Step 9: For RNNLM training and rescoring, please refer to kaldi/egs/swbd/s5c/rnnlm; for LHUC and BLHUC speaker adaptation, please refer to BLHUC, for large RNNLM training and rescoring, please refer to local/rnnlm/run_tdnn_lstm_fbk40_mod_hasfisher_large_drop_e40.sh.
I have also compiled the NAS codes in the following Kaldi directory.
If you find our codes or trained models useful in your research, please consider to star our repo and cite our paper:
@article{hu2022neural,
title={Neural architecture search for LF-MMI trained time delay neural networks},
author={Hu, Shoukang and Xie, Xurong and Cui, Mingyu and Deng, Jiajun and Liu, Shansong and Yu, Jianwei and Geng, Mengzhe and Liu, Xunying and Meng, Helen M},
journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
year={2022},
publisher={IEEE}
}