Cannot reproduce LSTM result on TIMIT
arvoelke opened this issue · 7 comments
I followed the instructions in the README and was able to run an experiment with the configuration located at cfg/TIMIT_baselines/TIMIT_LSTM_fmllr_cudnn.cfg
. However, instead of obtaining the reported 14.5% WER, the LSTM got 14.9% in one environment and 15.1% in another environment.
TIMIT_LSTM_fmllr_cudnn.cfg
[cfg_proto]
cfg_proto = proto/global.proto
cfg_proto_chunk = proto/global_chunk.proto
[exp]
cmd =
run_nn_script = run_nn
out_folder = exp/TIMIT_LSTM_fmllr_cudnn
seed = 2234
use_cuda = True
multi_gpu = False
save_gpumem = False
n_epochs_tr = 24
[dataset1]
data_name = TIMIT_tr
fea = fea_name=mfcc
fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data/train/feats.scp
fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/train/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/mfcc/cmvn_train.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |
cw_left=0
cw_right=0
fea_name=fbank
fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data-fbank/train/feats.scp
fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/train/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/fbank/cmvn_train.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- |
cw_left=0
cw_right=0
fea_name=fmllr
fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data-fmllr-tri3/train/feats.scp
fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/train/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/data-fmllr-tri3/cmvn_train.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- |
cw_left=0
cw_right=0
lab = lab_name=lab_cd
lab_folder=/home/arvoelke/git/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali
lab_opts=ali-to-pdf
lab_count_file=auto
lab_data_folder=/home/arvoelke/git/kaldi/egs/timit/s5/data/train/
lab_graph=/home/arvoelke/git/kaldi/egs/timit/s5/exp/tri3/graph
lab_name=lab_mono
lab_folder=/home/arvoelke/git/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali
lab_opts=ali-to-phones --per-frame=true
lab_count_file=none
lab_data_folder=/home/arvoelke/git/kaldi/egs/timit/s5/data/train/
lab_graph=/home/arvoelke/git/kaldi/egs/timit/s5/exp/tri3/graph
n_chunks = 5
[dataset2]
data_name = TIMIT_dev
fea = fea_name=mfcc
fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data/dev/feats.scp
fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/dev/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/mfcc/cmvn_dev.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |
cw_left=0
cw_right=0
fea_name=fbank
fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data-fbank/dev/feats.scp
fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/dev/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/fbank/cmvn_dev.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- |
cw_left=0
cw_right=0
fea_name=fmllr
fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data-fmllr-tri3/dev/feats.scp
fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/dev/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/data-fmllr-tri3/cmvn_dev.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- |
cw_left=0
cw_right=0
lab = lab_name=lab_cd
lab_folder=/home/arvoelke/git/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali_dev
lab_opts=ali-to-pdf
lab_count_file=auto
lab_data_folder=/home/arvoelke/git/kaldi/egs/timit/s5/data/dev/
lab_graph=/home/arvoelke/git/kaldi/egs/timit/s5/exp/tri3/graph
lab_name=lab_mono
lab_folder=/home/arvoelke/git/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali_dev
lab_opts=ali-to-phones --per-frame=true
lab_count_file=none
lab_data_folder=/home/arvoelke/git/kaldi/egs/timit/s5/data/dev/
lab_graph=/home/arvoelke/git/kaldi/egs/timit/s5/exp/tri3/graph
n_chunks = 1
[dataset3]
data_name = TIMIT_test
fea = fea_name=mfcc
fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data/test/feats.scp
fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/test/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/mfcc/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |
cw_left=0
cw_right=0
fea_name=fbank
fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data-fbank/test/feats.scp
fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/test/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/fbank/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- |
cw_left=0
cw_right=0
fea_name=fmllr
fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data-fmllr-tri3/test/feats.scp
fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/test/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/data-fmllr-tri3/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- |
cw_left=0
cw_right=0
lab = lab_name=lab_cd
lab_folder=/home/arvoelke/git/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali_test
lab_opts=ali-to-pdf
lab_count_file=auto
lab_data_folder=/home/arvoelke/git/kaldi/egs/timit/s5/data/test/
lab_graph=/home/arvoelke/git/kaldi/egs/timit/s5/exp/tri3/graph
lab_name=lab_mono
lab_folder=/home/arvoelke/git/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali_test
lab_opts=ali-to-phones --per-frame=true
lab_count_file=none
lab_data_folder=/home/arvoelke/git/kaldi/egs/timit/s5/data/test/
lab_graph=/home/arvoelke/git/kaldi/egs/timit/s5/exp/tri3/graph
n_chunks = 1
[data_use]
train_with = TIMIT_tr
valid_with = TIMIT_dev
forward_with = TIMIT_test
[batches]
batch_size_train = 8
max_seq_length_train = 1000
increase_seq_length_train = True
start_seq_len_train = 100
multply_factor_seq_len_train = 2
batch_size_valid = 8
max_seq_length_valid = 1000
[architecture1]
arch_name = LSTM_cudnn_layers
arch_proto = proto/LSTM_cudnn.proto
arch_library = neural_networks
arch_class = LSTM_cudnn
arch_pretrain_file = none
arch_freeze = False
arch_seq_model = True
hidden_size=550
num_layers=4
bias=True
batch_first=True
dropout=0.2
bidirectional=True
arch_lr = 0.0016
arch_halving_factor = 0.5
arch_improvement_threshold = 0.001
arch_opt = rmsprop
opt_momentum = 0.0
opt_alpha = 0.95
opt_eps = 1e-8
opt_centered = False
opt_weight_decay = 0.0
[architecture2]
arch_name = MLP_layers
arch_proto = proto/MLP.proto
arch_library = neural_networks
arch_class = MLP
arch_pretrain_file = none
arch_freeze = False
arch_seq_model = False
dnn_lay = N_out_lab_cd
dnn_drop = 0.0
dnn_use_laynorm_inp = False
dnn_use_batchnorm_inp = False
dnn_use_batchnorm = False
dnn_use_laynorm = False
dnn_act = softmax
arch_lr = 0.0016
arch_halving_factor = 0.5
arch_improvement_threshold = 0.001
arch_opt = rmsprop
opt_momentum = 0.0
opt_alpha = 0.95
opt_eps = 1e-8
opt_centered = False
opt_weight_decay = 0.0
[architecture3]
arch_name = MLP_layers2
arch_proto = proto/MLP.proto
arch_library = neural_networks
arch_class = MLP
arch_pretrain_file = none
arch_freeze = False
arch_seq_model = False
dnn_lay = N_out_lab_mono
dnn_drop = 0.0
dnn_use_laynorm_inp = False
dnn_use_batchnorm_inp = False
dnn_use_batchnorm = False
dnn_use_laynorm = False
dnn_act = softmax
arch_lr = 0.0004
arch_halving_factor = 0.5
arch_improvement_threshold = 0.001
arch_opt = rmsprop
opt_momentum = 0.0
opt_alpha = 0.95
opt_eps = 1e-8
opt_centered = False
opt_weight_decay = 0.0
[model]
model_proto = proto/model.proto
model = out_dnn1=compute(LSTM_cudnn_layers,fmllr)
out_dnn2=compute(MLP_layers,out_dnn1)
out_dnn3=compute(MLP_layers2,out_dnn1)
loss_mono=cost_nll(out_dnn3,lab_mono)
loss_mono_w=mult_constant(loss_mono,1.0)
loss_cd=cost_nll(out_dnn2,lab_cd)
loss_final=sum(loss_cd,loss_mono_w)
err_final=cost_err(out_dnn2,lab_cd)
[forward]
forward_out = out_dnn2
normalize_posteriors = True
normalize_with_counts_from = lab_cd
save_out_file = False
require_decoding = True
[decoding]
decoding_script_folder = kaldi_decoding_scripts/
decoding_script = decode_dnn.sh
decoding_proto = proto/decoding.proto
min_active = 200
max_active = 7000
max_mem = 50000000
beam = 13.0
latbeam = 8.0
acwt = 0.2
max_arcs = -1
skip_scoring = false
scoring_script = local/score.sh
scoring_opts = "--min-lmwt 1 --max-lmwt 10"
norm_vars = False
(15.1%) res.res
ep=00 tr=['TIMIT_tr'] loss=5.633 err=0.731 valid=TIMIT_dev loss=2.810 err=0.506 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=355
ep=01 tr=['TIMIT_tr'] loss=2.268 err=0.425 valid=TIMIT_dev loss=2.105 err=0.399 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=367
ep=02 tr=['TIMIT_tr'] loss=1.661 err=0.329 valid=TIMIT_dev loss=1.956 err=0.374 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=349
ep=03 tr=['TIMIT_tr'] loss=1.375 err=0.282 valid=TIMIT_dev loss=1.937 err=0.368 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=355
ep=04 tr=['TIMIT_tr'] loss=1.155 err=0.244 valid=TIMIT_dev loss=1.946 err=0.365 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=354
ep=05 tr=['TIMIT_tr'] loss=0.983 err=0.214 valid=TIMIT_dev loss=1.942 err=0.353 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=282
ep=06 tr=['TIMIT_tr'] loss=0.838 err=0.187 valid=TIMIT_dev loss=1.992 err=0.356 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=260
ep=07 tr=['TIMIT_tr'] loss=0.567 err=0.128 valid=TIMIT_dev loss=1.975 err=0.335 lr_architecture1=0.0008 lr_architecture2=0.0008 lr_architecture3=0.0002 time(s)=259
ep=08 tr=['TIMIT_tr'] loss=0.440 err=0.100 valid=TIMIT_dev loss=2.053 err=0.335 lr_architecture1=0.0008 lr_architecture2=0.0008 lr_architecture3=0.0002 time(s)=259
ep=09 tr=['TIMIT_tr'] loss=0.315 err=0.068 valid=TIMIT_dev loss=2.105 err=0.330 lr_architecture1=0.0004 lr_architecture2=0.0004 lr_architecture3=0.0001 time(s)=261
ep=10 tr=['TIMIT_tr'] loss=0.256 err=0.054 valid=TIMIT_dev loss=2.176 err=0.329 lr_architecture1=0.0004 lr_architecture2=0.0004 lr_architecture3=0.0001 time(s)=257
ep=11 tr=['TIMIT_tr'] loss=0.217 err=0.045 valid=TIMIT_dev loss=2.252 err=0.331 lr_architecture1=0.0004 lr_architecture2=0.0004 lr_architecture3=0.0001 time(s)=260
ep=12 tr=['TIMIT_tr'] loss=0.171 err=0.032 valid=TIMIT_dev loss=2.299 err=0.327 lr_architecture1=0.0002 lr_architecture2=0.0002 lr_architecture3=5e-05 time(s)=258
ep=13 tr=['TIMIT_tr'] loss=0.149 err=0.026 valid=TIMIT_dev loss=2.353 err=0.329 lr_architecture1=0.0002 lr_architecture2=0.0002 lr_architecture3=5e-05 time(s)=259
ep=14 tr=['TIMIT_tr'] loss=0.129 err=0.020 valid=TIMIT_dev loss=2.388 err=0.327 lr_architecture1=0.0001 lr_architecture2=0.0001 lr_architecture3=2.5e-05 time(s)=257
ep=15 tr=['TIMIT_tr'] loss=0.119 err=0.018 valid=TIMIT_dev loss=2.414 err=0.327 lr_architecture1=0.0001 lr_architecture2=0.0001 lr_architecture3=2.5e-05 time(s)=260
ep=16 tr=['TIMIT_tr'] loss=0.110 err=0.015 valid=TIMIT_dev loss=2.435 err=0.327 lr_architecture1=5e-05 lr_architecture2=5e-05 lr_architecture3=1.25e-05 time(s)=260
ep=17 tr=['TIMIT_tr'] loss=0.105 err=0.014 valid=TIMIT_dev loss=2.446 err=0.327 lr_architecture1=2.5e-05 lr_architecture2=2.5e-05 lr_architecture3=6.25e-06 time(s)=258
ep=18 tr=['TIMIT_tr'] loss=0.103 err=0.013 valid=TIMIT_dev loss=2.451 err=0.327 lr_architecture1=1.25e-05 lr_architecture2=1.25e-05 lr_architecture3=3.125e-06 time(s)=260
ep=19 tr=['TIMIT_tr'] loss=0.101 err=0.013 valid=TIMIT_dev loss=2.453 err=0.327 lr_architecture1=6.25e-06 lr_architecture2=6.25e-06 lr_architecture3=1.5625e-06 time(s)=258
ep=20 tr=['TIMIT_tr'] loss=0.100 err=0.013 valid=TIMIT_dev loss=2.454 err=0.327 lr_architecture1=3.125e-06 lr_architecture2=3.125e-06 lr_architecture3=7.8125e-07 time(s)=260
ep=21 tr=['TIMIT_tr'] loss=0.100 err=0.013 valid=TIMIT_dev loss=2.456 err=0.327 lr_architecture1=1.5625e-06 lr_architecture2=1.5625e-06 lr_architecture3=3.90625e-07 time(s)=258
ep=22 tr=['TIMIT_tr'] loss=0.100 err=0.013 valid=TIMIT_dev loss=2.455 err=0.327 lr_architecture1=7.8125e-07 lr_architecture2=7.8125e-07 lr_architecture3=1.953125e-07 time(s)=210
ep=23 tr=['TIMIT_tr'] loss=0.099 err=0.012 valid=TIMIT_dev loss=2.456 err=0.327 lr_architecture1=3.90625e-07 lr_architecture2=3.90625e-07 lr_architecture3=9.765625e-08 time(s)=157
%WER 15.1 | 192 7215 | 87.2 9.8 3.0 2.4 15.1 98.4 | -1.925 | /home/arvoelke/git/pytorch-kaldi/exp/lstm0/decode_TIMIT_test_out_dnn2/score_6/ctm_39phn.filt.sys
(14.9%) res.res
ep=00 tr=['TIMIT_tr'] loss=6.826 err=0.801 valid=TIMIT_dev loss=3.100 err=0.544 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=346
ep=01 tr=['TIMIT_tr'] loss=2.439 err=0.449 valid=TIMIT_dev loss=2.164 err=0.408 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=339
ep=02 tr=['TIMIT_tr'] loss=1.752 err=0.344 valid=TIMIT_dev loss=1.971 err=0.376 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=322
ep=03 tr=['TIMIT_tr'] loss=1.452 err=0.296 valid=TIMIT_dev loss=1.957 err=0.371 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=330
ep=04 tr=['TIMIT_tr'] loss=1.224 err=0.257 valid=TIMIT_dev loss=1.927 err=0.363 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=331
ep=05 tr=['TIMIT_tr'] loss=1.045 err=0.227 valid=TIMIT_dev loss=1.931 err=0.355 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=293
ep=06 tr=['TIMIT_tr'] loss=0.890 err=0.198 valid=TIMIT_dev loss=1.966 err=0.351 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=237
ep=07 tr=['TIMIT_tr'] loss=0.763 err=0.174 valid=TIMIT_dev loss=2.061 err=0.352 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=241
ep=08 tr=['TIMIT_tr'] loss=0.513 err=0.118 valid=TIMIT_dev loss=2.057 err=0.338 lr_architecture1=0.0008 lr_architecture2=0.0008 lr_architecture3=0.0002 time(s)=240
ep=09 tr=['TIMIT_tr'] loss=0.396 err=0.092 valid=TIMIT_dev loss=2.139 err=0.337 lr_architecture1=0.0008 lr_architecture2=0.0008 lr_architecture3=0.0002 time(s)=241
ep=10 tr=['TIMIT_tr'] loss=0.326 err=0.076 valid=TIMIT_dev loss=2.222 err=0.334 lr_architecture1=0.0008 lr_architecture2=0.0008 lr_architecture3=0.0002 time(s)=241
ep=11 tr=['TIMIT_tr'] loss=0.274 err=0.064 valid=TIMIT_dev loss=2.341 err=0.335 lr_architecture1=0.0008 lr_architecture2=0.0008 lr_architecture3=0.0002 time(s)=242
ep=12 tr=['TIMIT_tr'] loss=0.196 err=0.043 valid=TIMIT_dev loss=2.389 err=0.329 lr_architecture1=0.0004 lr_architecture2=0.0004 lr_architecture3=0.0001 time(s)=242
ep=13 tr=['TIMIT_tr'] loss=0.155 err=0.032 valid=TIMIT_dev loss=2.461 err=0.329 lr_architecture1=0.0004 lr_architecture2=0.0004 lr_architecture3=0.0001 time(s)=240
ep=14 tr=['TIMIT_tr'] loss=0.122 err=0.022 valid=TIMIT_dev loss=2.525 err=0.326 lr_architecture1=0.0002 lr_architecture2=0.0002 lr_architecture3=5e-05 time(s)=241
ep=15 tr=['TIMIT_tr'] loss=0.105 err=0.018 valid=TIMIT_dev loss=2.577 err=0.327 lr_architecture1=0.0002 lr_architecture2=0.0002 lr_architecture3=5e-05 time(s)=240
ep=16 tr=['TIMIT_tr'] loss=0.090 err=0.014 valid=TIMIT_dev loss=2.608 err=0.326 lr_architecture1=0.0001 lr_architecture2=0.0001 lr_architecture3=2.5e-05 time(s)=242
ep=17 tr=['TIMIT_tr'] loss=0.083 err=0.012 valid=TIMIT_dev loss=2.645 err=0.325 lr_architecture1=0.0001 lr_architecture2=0.0001 lr_architecture3=2.5e-05 time(s)=242
ep=18 tr=['TIMIT_tr'] loss=0.078 err=0.011 valid=TIMIT_dev loss=2.666 err=0.326 lr_architecture1=0.0001 lr_architecture2=0.0001 lr_architecture3=2.5e-05 time(s)=239
ep=19 tr=['TIMIT_tr'] loss=0.072 err=0.009 valid=TIMIT_dev loss=2.688 err=0.326 lr_architecture1=5e-05 lr_architecture2=5e-05 lr_architecture3=1.25e-05 time(s)=242
ep=20 tr=['TIMIT_tr'] loss=0.069 err=0.008 valid=TIMIT_dev loss=2.696 err=0.326 lr_architecture1=2.5e-05 lr_architecture2=2.5e-05 lr_architecture3=6.25e-06 time(s)=239
ep=21 tr=['TIMIT_tr'] loss=0.067 err=0.008 valid=TIMIT_dev loss=2.707 err=0.326 lr_architecture1=1.25e-05 lr_architecture2=1.25e-05 lr_architecture3=3.125e-06 time(s)=242
ep=22 tr=['TIMIT_tr'] loss=0.066 err=0.007 valid=TIMIT_dev loss=2.707 err=0.326 lr_architecture1=6.25e-06 lr_architecture2=6.25e-06 lr_architecture3=1.5625e-06 time(s)=241
ep=23 tr=['TIMIT_tr'] loss=0.065 err=0.007 valid=TIMIT_dev loss=2.708 err=0.326 lr_architecture1=3.125e-06 lr_architecture2=3.125e-06 lr_architecture3=7.8125e-07 time(s)=240
%WER 14.9 | 192 7215 | 87.1 9.8 3.0 2.1 14.9 98.4 | -1.878 | /home/arvoelke/git/pytorch-kaldi/exp/lstm1/decode_TIMIT_test_out_dnn2/score_8/ctm_39phn.filt.sys
I reran each 2-3 times and obtained the same results each time (respectively).
Environment:
- Ubuntu 18.04
- Python 3.7.6
- master branch of
mravanelli/pytorch-kaldi
- kaldi-asr/kaldi@23868d5 branch of
kaldi-asr/kaldi
- conda
I can include the yml
environment files for the two conda environments, but the biggest difference to jump out to me is:
pip install torch==1.4.0
+cudnn=7.6.5=cuda10.0_0
; versus- conda pytorch channel with
pytorch=1.4.0=py3.7_cuda10.1.243_cudnn7.6.3_0
Hi ! For some reasons, the CUDNN version obtains worst performances ... Could you try the non cudnn version ?
Thanks.
Thanks for the help. That did slightly better (14.8%) using the conda environment with the pytorch channel. To run the config file, all I modified were the [dataset*]
sections.
TIMIT_LSTM_fmllr.cfg
[cfg_proto]
cfg_proto = proto/global.proto
cfg_proto_chunk = proto/global_chunk.proto
[exp]
cmd =
run_nn_script = run_nn
out_folder = exp/TIMIT_LSTM_fmllr
seed = 2234
use_cuda = True
multi_gpu = False
save_gpumem = False
n_epochs_tr = 24
[dataset1]
data_name = TIMIT_tr
fea = fea_name=mfcc
fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data/train/feats.scp
fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/train/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/mfcc/cmvn_train.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |
cw_left=0
cw_right=0
fea_name=fbank
fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data-fbank/train/feats.scp
fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/train/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/fbank/cmvn_train.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- |
cw_left=0
cw_right=0
fea_name=fmllr
fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data-fmllr-tri3/train/feats.scp
fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/train/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/data-fmllr-tri3/cmvn_train.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- |
cw_left=0
cw_right=0
lab = lab_name=lab_cd
lab_folder=/home/arvoelke/git/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali
lab_opts=ali-to-pdf
lab_count_file=auto
lab_data_folder=/home/arvoelke/git/kaldi/egs/timit/s5/data/train/
lab_graph=/home/arvoelke/git/kaldi/egs/timit/s5/exp/tri3/graph
lab_name=lab_mono
lab_folder=/home/arvoelke/git/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali
lab_opts=ali-to-phones --per-frame=true
lab_count_file=none
lab_data_folder=/home/arvoelke/git/kaldi/egs/timit/s5/data/train/
lab_graph=/home/arvoelke/git/kaldi/egs/timit/s5/exp/tri3/graph
n_chunks = 5
[dataset2]
data_name = TIMIT_dev
fea = fea_name=mfcc
fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data/dev/feats.scp
fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/dev/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/mfcc/cmvn_dev.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |
cw_left=0
cw_right=0
fea_name=fbank
fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data-fbank/dev/feats.scp
fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/dev/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/fbank/cmvn_dev.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- |
cw_left=0
cw_right=0
fea_name=fmllr
fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data-fmllr-tri3/dev/feats.scp
fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/dev/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/data-fmllr-tri3/cmvn_dev.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- |
cw_left=0
cw_right=0
lab = lab_name=lab_cd
lab_folder=/home/arvoelke/git/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali_dev
lab_opts=ali-to-pdf
lab_count_file=auto
lab_data_folder=/home/arvoelke/git/kaldi/egs/timit/s5/data/dev/
lab_graph=/home/arvoelke/git/kaldi/egs/timit/s5/exp/tri3/graph
lab_name=lab_mono
lab_folder=/home/arvoelke/git/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali_dev
lab_opts=ali-to-phones --per-frame=true
lab_count_file=none
lab_data_folder=/home/arvoelke/git/kaldi/egs/timit/s5/data/dev/
lab_graph=/home/arvoelke/git/kaldi/egs/timit/s5/exp/tri3/graph
n_chunks = 1
[dataset3]
data_name = TIMIT_test
fea = fea_name=mfcc
fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data/test/feats.scp
fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/test/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/mfcc/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |
cw_left=0
cw_right=0
fea_name=fbank
fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data-fbank/test/feats.scp
fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/test/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/fbank/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- |
cw_left=0
cw_right=0
fea_name=fmllr
fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data-fmllr-tri3/test/feats.scp
fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/test/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/data-fmllr-tri3/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- |
cw_left=0
cw_right=0
lab = lab_name=lab_cd
lab_folder=/home/arvoelke/git/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali_test
lab_opts=ali-to-pdf
lab_count_file=auto
lab_data_folder=/home/arvoelke/git/kaldi/egs/timit/s5/data/test/
lab_graph=/home/arvoelke/git/kaldi/egs/timit/s5/exp/tri3/graph
lab_name=lab_mono
lab_folder=/home/arvoelke/git/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali_test
lab_opts=ali-to-phones --per-frame=true
lab_count_file=none
lab_data_folder=/home/arvoelke/git/kaldi/egs/timit/s5/data/test/
lab_graph=/home/arvoelke/git/kaldi/egs/timit/s5/exp/tri3/graph
n_chunks = 1
[data_use]
train_with = TIMIT_tr
valid_with = TIMIT_dev
forward_with = TIMIT_test
[batches]
batch_size_train = 8
max_seq_length_train = 1000
increase_seq_length_train = True
start_seq_len_train = 100
multply_factor_seq_len_train = 2
batch_size_valid = 8
max_seq_length_valid = 1000
[architecture1]
arch_name = LSTM_cudnn_layers
arch_proto = proto/LSTM.proto
arch_library = neural_networks
arch_class = LSTM
arch_pretrain_file = none
arch_freeze = False
arch_seq_model = True
lstm_lay = 550,550,550,550
lstm_drop = 0.2,0.2,0.2,0.2
lstm_use_laynorm_inp = False
lstm_use_batchnorm_inp = False
lstm_use_laynorm = False,False,False,False
lstm_use_batchnorm = True,True,True,True
lstm_bidir = True
lstm_act = tanh,tanh,tanh,tanh
lstm_orthinit = True
arch_lr = 0.0016
arch_halving_factor = 0.5
arch_improvement_threshold = 0.001
arch_opt = rmsprop
opt_momentum = 0.0
opt_alpha = 0.95
opt_eps = 1e-8
opt_centered = False
opt_weight_decay = 0.0
[architecture2]
arch_name = MLP_layers
arch_proto = proto/MLP.proto
arch_library = neural_networks
arch_class = MLP
arch_pretrain_file = none
arch_freeze = False
arch_seq_model = False
dnn_lay = N_out_lab_cd
dnn_drop = 0.0
dnn_use_laynorm_inp = False
dnn_use_batchnorm_inp = False
dnn_use_batchnorm = False
dnn_use_laynorm = False
dnn_act = softmax
arch_lr = 0.0016
arch_halving_factor = 0.5
arch_improvement_threshold = 0.001
arch_opt = rmsprop
opt_momentum = 0.0
opt_alpha = 0.95
opt_eps = 1e-8
opt_centered = False
opt_weight_decay = 0.0
[architecture3]
arch_name = MLP_layers2
arch_proto = proto/MLP.proto
arch_library = neural_networks
arch_class = MLP
arch_pretrain_file = none
arch_freeze = False
arch_seq_model = False
dnn_lay = N_out_lab_mono
dnn_drop = 0.0
dnn_use_laynorm_inp = False
dnn_use_batchnorm_inp = False
dnn_use_batchnorm = False
dnn_use_laynorm = False
dnn_act = softmax
arch_lr = 0.0004
arch_halving_factor = 0.5
arch_improvement_threshold = 0.001
arch_opt = rmsprop
opt_momentum = 0.0
opt_alpha = 0.95
opt_eps = 1e-8
opt_centered = False
opt_weight_decay = 0.0
[model]
model_proto = proto/model.proto
model = out_dnn1=compute(LSTM_cudnn_layers,fmllr)
out_dnn2=compute(MLP_layers,out_dnn1)
out_dnn3=compute(MLP_layers2,out_dnn1)
loss_mono=cost_nll(out_dnn3,lab_mono)
loss_mono_w=mult_constant(loss_mono,1.0)
loss_cd=cost_nll(out_dnn2,lab_cd)
loss_final=sum(loss_cd,loss_mono_w)
err_final=cost_err(out_dnn2,lab_cd)
[forward]
forward_out = out_dnn2
normalize_posteriors = True
normalize_with_counts_from = lab_cd
save_out_file = False
require_decoding = True
[decoding]
decoding_script_folder = kaldi_decoding_scripts/
decoding_script = decode_dnn.sh
decoding_proto = proto/decoding.proto
min_active = 200
max_active = 7000
max_mem = 50000000
beam = 13.0
latbeam = 8.0
acwt = 0.2
max_arcs = -1
skip_scoring = false
scoring_script = local/score.sh
scoring_opts = "--min-lmwt 1 --max-lmwt 10"
norm_vars = False
(14.8%) res.res
ep=00 tr=['TIMIT_tr'] loss=4.186 err=0.637 valid=TIMIT_dev loss=2.686 err=0.481 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=1140
ep=01 tr=['TIMIT_tr'] loss=2.390 err=0.441 valid=TIMIT_dev loss=2.188 err=0.403 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=1259
ep=02 tr=['TIMIT_tr'] loss=1.862 err=0.361 valid=TIMIT_dev loss=1.990 err=0.361 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=1469
ep=03 tr=['TIMIT_tr'] loss=1.632 err=0.325 valid=TIMIT_dev loss=1.966 err=0.358 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=1339
ep=04 tr=['TIMIT_tr'] loss=1.464 err=0.299 valid=TIMIT_dev loss=1.957 err=0.349 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=1403
ep=05 tr=['TIMIT_tr'] loss=1.342 err=0.280 valid=TIMIT_dev loss=1.974 err=0.345 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=1438
ep=06 tr=['TIMIT_tr'] loss=1.236 err=0.262 valid=TIMIT_dev loss=2.000 err=0.344 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=1434
ep=07 tr=['TIMIT_tr'] loss=1.151 err=0.247 valid=TIMIT_dev loss=1.999 err=0.339 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=1459
ep=08 tr=['TIMIT_tr'] loss=1.066 err=0.232 valid=TIMIT_dev loss=2.021 err=0.338 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=1462
ep=09 tr=['TIMIT_tr'] loss=1.005 err=0.222 valid=TIMIT_dev loss=2.072 err=0.340 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=1531
ep=10 tr=['TIMIT_tr'] loss=0.818 err=0.185 valid=TIMIT_dev loss=2.042 err=0.326 lr_architecture1=0.0008 lr_architecture2=0.0008 lr_architecture3=0.0002 time(s)=1551
ep=11 tr=['TIMIT_tr'] loss=0.744 err=0.171 valid=TIMIT_dev loss=2.077 err=0.325 lr_architecture1=0.0008 lr_architecture2=0.0008 lr_architecture3=0.0002 time(s)=1489
ep=12 tr=['TIMIT_tr'] loss=0.693 err=0.161 valid=TIMIT_dev loss=2.154 err=0.327 lr_architecture1=0.0008 lr_architecture2=0.0008 lr_architecture3=0.0002 time(s)=1542
ep=13 tr=['TIMIT_tr'] loss=0.613 err=0.144 valid=TIMIT_dev loss=2.145 err=0.319 lr_architecture1=0.0004 lr_architecture2=0.0004 lr_architecture3=0.0001 time(s)=1558
ep=14 tr=['TIMIT_tr'] loss=0.580 err=0.138 valid=TIMIT_dev loss=2.180 err=0.319 lr_architecture1=0.0004 lr_architecture2=0.0004 lr_architecture3=0.0001 time(s)=1434
ep=15 tr=['TIMIT_tr'] loss=0.538 err=0.129 valid=TIMIT_dev loss=2.208 err=0.316 lr_architecture1=0.0002 lr_architecture2=0.0002 lr_architecture3=5e-05 time(s)=1393
ep=16 tr=['TIMIT_tr'] loss=0.520 err=0.125 valid=TIMIT_dev loss=2.237 err=0.318 lr_architecture1=0.0002 lr_architecture2=0.0002 lr_architecture3=5e-05 time(s)=1541
ep=17 tr=['TIMIT_tr'] loss=0.500 err=0.120 valid=TIMIT_dev loss=2.241 err=0.316 lr_architecture1=0.0001 lr_architecture2=0.0001 lr_architecture3=2.5e-05 time(s)=1561
ep=18 tr=['TIMIT_tr'] loss=0.494 err=0.119 valid=TIMIT_dev loss=2.247 err=0.315 lr_architecture1=0.0001 lr_architecture2=0.0001 lr_architecture3=2.5e-05 time(s)=1490
ep=19 tr=['TIMIT_tr'] loss=0.485 err=0.117 valid=TIMIT_dev loss=2.254 err=0.315 lr_architecture1=0.0001 lr_architecture2=0.0001 lr_architecture3=2.5e-05 time(s)=1419
ep=20 tr=['TIMIT_tr'] loss=0.477 err=0.115 valid=TIMIT_dev loss=2.269 err=0.314 lr_architecture1=5e-05 lr_architecture2=5e-05 lr_architecture3=1.25e-05 time(s)=1490
ep=21 tr=['TIMIT_tr'] loss=0.471 err=0.114 valid=TIMIT_dev loss=2.277 err=0.314 lr_architecture1=5e-05 lr_architecture2=5e-05 lr_architecture3=1.25e-05 time(s)=1407
ep=22 tr=['TIMIT_tr'] loss=0.466 err=0.113 valid=TIMIT_dev loss=2.279 err=0.315 lr_architecture1=2.5e-05 lr_architecture2=2.5e-05 lr_architecture3=6.25e-06 time(s)=1518
ep=23 tr=['TIMIT_tr'] loss=0.462 err=0.112 valid=TIMIT_dev loss=2.282 err=0.314 lr_architecture1=1.25e-05 lr_architecture2=1.25e-05 lr_architecture3=3.125e-06 time(s)=1500
%WER 14.8 | 192 7215 | 87.4 9.8 2.8 2.2 14.8 99.5 | -2.030 | /home/arvoelke/git/pytorch-kaldi/exp/TIMIT_LSTM_fmllr/decode_TIMIT_test_out_dnn2/score_6/ctm_39phn.filt.sys
Could there be issues with using a different CUDA, or random variability due to differenet environments? Would it help to run with several different seeds and take the average test result?
Not that much, the variability is 0.2 on TIMIT. Could you please post your res.res file ? I'm suspecting an old bug ...
The res.res
file is in my previous post (click the triangle/name to toggle).
I see sorry. I was suspecting a LR bug, but apparently there is no bug. Hum, you could try to do multiple runs to see if you can achieve our 14.5%. @mravanelli I'm asking, but I believe that the answer is no: Have we changed the original configuration files for the TIMIT recipes?
Meanwhile, I'll try on my side. Could you please try to replicate the FBANK or MFCC results to see if this is a general problem or only specific to fmllr ?
Thanks for getting back. I will look into trying older versions as well as the other features when I can. In doing that, should I be rerunning the kaldi-asr
scripts to regenerate the datasets each time that I switch the pytorch/CUDA versions? Or do you know whether the difference would be isolated somewhere in the training (and so I could reuse the same datasets)?