How to decode the separated audios
Closed this issue ยท 15 comments
Hello. I have trained the asr model with sms_wsj.train_baseline_asr
. I want to know is there any guidance to get the WER of the separated audios by using the trained asr model? Seems the sms_wsj/kaldi/get_kaldi_wer.py
can do that. But I don't know how to prepare my separated results (e.g. dir structrure or other data) to meet the requirements of this script.
Thanks in advance.
Hello,
in the beginning of the file sms_wsj/kaldi/get_kaldi_wer.py
are several examples, how it can be used:
e.g.:
python -m sms_wsj.kaldi.get_kaldi_wer -F /EXP/DIR decode with kaldi_data_dir=/KALDI/DATA/DIR model_egs_dir=/MODEL/EGS/DIR dataset=test_eval92
where /EXP/DIR
is the working/output dir, /KALDI/DATA/DIR
a dir with "kaldi" data style, /MODEL/EGS/DIR
the path to the trained model and test_eval92
is the dataset, i.e. a folder in /KALDI/DATA/DIR
.
Seems the dataset
parameter is not valid as it is not defined and used in get_kaldi_wer.py
And if I choose to follow the command python -m sms_wsj.kaldi.get_kaldi_wer -F /EXP/DIR with audio_dir=/AUDIO/DIR json_path=/JSON/PATH model_egs_dir=/MODEL/EGS/DIR
, which json file should I provide?
Seems the dataset parameter is not valid as it is not defined and used in get_kaldi_wer.py
Sorry, the signature was changed and no one checked the examples in the beginning.
For decode, kaldi_data_dir and dataset are replaced by dataset_dir.
And if I choose to follow the command python -m sms_wsj.kaldi.get_kaldi_wer -F /EXP/DIR with audio_dir=/AUDIO/DIR json_path=/JSON/PATH model_egs_dir=/MODEL/EGS/DIR, which json file should I provide?
The json_path is the path to the sms_wsj.json
. In the {audio_dir}/[cv_dev93|test_eval92]
the code will search for e.g. {id}_{spk}.wav
, where id
is the example_id and spk
is 0 or 1 (Can be changed with id_to_file_name
, but requires proper escaping for the shell.).
@boeddeker Thank you for your answering. I will try. ^^
Hello, I'm back. I tried: python -m sms_wsj.kaldi.get_kaldi_wer -F exp with audio_dir=/data/quancs/datasets/sms_wsj/early/test_eval92/ json_path=/data/quancs/datasets/sms_wsj/sms_wsj.json model_egs_dir=$KALDI_ROOT/egs/sms_single_speaker/s5/
, but it reported an error (below). I don't know if I miss something.
root@36b3ff17d8c5:~/projects/sms_wsj# python -m sms_wsj.kaldi.get_kaldi_wer -F exp with audio_dir=/data/quancs/datasets/sms_wsj/early/test_eval92/ json_path=/data/quancs/datasets/sms_wsj/sms_wsj.json model_egs_dir=$KALDI_ROOT/egs/sms_single_speaker/s5/
INFO - Kaldi array - Running command 'run'
INFO - Kaldi array - Started run with ID "1"
Create /root/projects/sms_wsj/exp directory
Create data dir for sms_enh/test_eval92 data
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/utt2spk is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/text is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/wav.scp is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/spk2gender is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/utt2dur is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/reco2dur is not in sorted order or not unique, sorting it
fix_data_dir.sh: kept all 2664 utterances.
fix_data_dir.sh: old files are kept in /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/.backup
steps/make_mfcc.sh --nj 48 --mfcc-config /root/projects/sms_wsj/exp/conf/mfcc_hires.conf --cmd run.pl /root/projects/sms_wsj/exp/data/sms_enh/test_eval92 /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/make_mfcc /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/mfcc
utils/validate_data_dir.sh: Successfully validated data-directory /root/projects/sms_wsj/exp/data/sms_enh/test_eval92
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
steps/make_mfcc.sh: Succeeded creating MFCC features for test_eval92
steps/compute_cmvn_stats.sh /root/projects/sms_wsj/exp/data/sms_enh/test_eval92 /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/make_mfcc /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/mfcc
Succeeded creating CMVN stats for test_eval92
fix_data_dir.sh: kept all 2664 utterances.
fix_data_dir.sh: old files are kept in /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/.backup
Directory /root/projects/sms_wsj/exp/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92 not found, estimating ivectors
steps/online/nnet2/extract_ivectors_online.sh --cmd run.pl --nj 48 /root/projects/sms_wsj/exp/data/sms_enh/test_eval92 /root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/nnet3/extractor /root/projects/sms_wsj/exp/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92
utils/split_scp.pl: Refusing to split data because number of speakers 8 is less than the number of output .scp files 48
ERROR - Kaldi array - Failed after 0:00:15!
Traceback (most recent calls WITHOUT Sacred internals):
File "/root/projects/sms_wsj/sms_wsj/kaldi/get_kaldi_wer.py", line 324, in run
decode(
File "/root/projects/sms_wsj/sms_wsj/kaldi/get_kaldi_wer.py", line 145, in decode
ivector_dir = calculate_ivectors(
File "/root/projects/sms_wsj/sms_wsj/kaldi/utils.py", line 372, in calculate_ivectors
run_process([
File "/root/projects/sms_wsj/sms_wsj/kaldi/utils.py", line 413, in run_process
subprocess.run(
File "/root/miniconda3/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['steps/online/nnet2/extract_ivectors_online.sh', '--cmd', 'run.pl', '--nj', '48', '/root/projects/sms_wsj/exp/data/sms_enh/test_eval92', '/root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/nnet3/extractor', '/root/projects/sms_wsj/exp/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92']' returned non-zero exit status 1.
Kaldi complains, that --nj 48
(The number of jobs/workers) is too high. Kaldi cannot split speakers and fails if nj
is too large.
In #24 I pushed a fix, so the new default is min(8, os.cpu_count())
instead of os.cpu_count()
.
Alternatively, you could also change the number of jobs on the command line with num_jobs=8
.
Sorry, we didn't recognize this, because ran the code on machines with 8 cores.
Nice! Decoding now. Great thanks for your patient help. ๐
Kaldi complains, that
--nj 48
(The number of jobs/workers) is too high. Kaldi cannot split speakers and fails ifnj
is too large. In #24 I pushed a fix, so the new default ismin(8, os.cpu_count())
instead ofos.cpu_count()
.Alternatively, you could also change the number of jobs on the command line with
num_jobs=8
.Sorry, we didn't recognize this, because ran the code on machines with 8 cores.
Great.
Decoding is over now. The WER is printed (7.02 for early/test_eval92). Is this WER correct as it doesn't match any one in your paper? And, it also reported an error (maybe it doesn't matter).
root@36b3ff17d8c5:~/projects/sms_wsj# python -m sms_wsj.kaldi.get_kaldi_wer -F exp3 with audio_dir=/data/quancs/datasets/sms_wsj/early/test_eval92/ json_path=/data/quancs/datasets/sms_wsj/sms_wsj.json model_egs_dir=$KALDI_ROOT/egs/sms_single_speaker/s5/
INFO - Kaldi array - Running command 'run'
INFO - Kaldi array - Started run with ID "1"
Create /root/projects/sms_wsj/exp3 directory
Create data dir for sms_enh/test_eval92 data
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/utt2spk is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/text is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/wav.scp is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/spk2gender is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/utt2dur is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/reco2dur is not in sorted order or not unique, sorting it
fix_data_dir.sh: kept all 2664 utterances.
fix_data_dir.sh: old files are kept in /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/.backup
steps/make_mfcc.sh --nj 8 --mfcc-config /root/projects/sms_wsj/exp3/conf/mfcc_hires.conf --cmd run.pl /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92 /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/make_mfcc /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/mfcc
utils/validate_data_dir.sh: Successfully validated data-directory /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
steps/make_mfcc.sh: Succeeded creating MFCC features for test_eval92
steps/compute_cmvn_stats.sh /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92 /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/make_mfcc /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/mfcc
Succeeded creating CMVN stats for test_eval92
fix_data_dir.sh: kept all 2664 utterances.
fix_data_dir.sh: old files are kept in /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/.backup
Directory /root/projects/sms_wsj/exp3/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92 not found, estimating ivectors
steps/online/nnet2/extract_ivectors_online.sh --cmd run.pl --nj 8 /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92 /root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/nnet3/extractor /root/projects/sms_wsj/exp3/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92
steps/online/nnet2/extract_ivectors_online.sh: extracting iVectors
steps/online/nnet2/extract_ivectors_online.sh: combining iVectors across jobs
steps/online/nnet2/extract_ivectors_online.sh: done extracting (online) iVectors to /root/projects/sms_wsj/exp3/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92 using the extractor in /root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/nnet3/extractor.
steps/nnet3/decode.sh --acwt 1.0 --post-decode-acwt 10.0 --extra-left-context 0 --extra-right-context 0 --extra-left-context-initial 0 --extra-right-context-final 0 --frames-per-chunk 140 --nj 8 --cmd run.pl --online-ivector-dir /root/projects/sms_wsj/exp3/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92 /root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/chain/tree_a_sp/graph_tgpr /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92 /root/projects/sms_wsj/exp3/exp/sms_single_speaker/tdnn1a_sp/decode_sms_enh_test_eval92
steps/nnet3/decode.sh: feature type is raw
steps/diagnostic/analyze_lats.sh --cmd run.pl --iter final /root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/chain/tree_a_sp/graph_tgpr /root/projects/sms_wsj/exp3/exp/sms_single_speaker/tdnn1a_sp/decode_sms_enh_test_eval92
steps/diagnostic/analyze_lats.sh: see stats in /root/projects/sms_wsj/exp3/exp/sms_single_speaker/tdnn1a_sp/decode_sms_enh_test_eval92/log/analyze_alignments.log
Overall, lattice depth (10,50,90-percentile)=(1,3,19) and mean=8.9
steps/diagnostic/analyze_lats.sh: see stats in /root/projects/sms_wsj/exp3/exp/sms_single_speaker/tdnn1a_sp/decode_sms_enh_test_eval92/log/analyze_lattice_depth_stats.log
score best paths
local/score.sh --cmd run.pl /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92 /root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/chain/tree_a_sp/graph_tgpr /root/projects/sms_wsj/exp3/exp/sms_single_speaker/tdnn1a_sp/decode_sms_enh_test_eval92
local/score.sh: scoring with word insertion penalty=0.0,0.5,1.0
score confidence and timing with sclite
Decoding done.
%WER 7.02 [ 3168 / 45144, 494 ins, 379 del, 2295 sub ] /root/projects/sms_wsj/exp3/exp/sms_single_speaker/tdnn1a_sp/decode_sms_enh_test_eval92/wer_9_1.0
Create data dir for sms_enh/cv_dev93 data
ERROR - Kaldi array - Failed after 0:05:40!
Traceback (most recent calls WITHOUT Sacred internals):
File "/root/projects/sms_wsj/sms_wsj/kaldi/utils.py", line 105, in _get_wav_command_for_audio_dir
assert audio_path.exists(), audio_path
AssertionError: /data/quancs/datasets/sms_wsj/early/test_eval92/cv_dev93/0_4k6c0303_4k4c0319_0.wav
During handling of the above exception, another exception occurred:
Traceback (most recent calls WITHOUT Sacred internals):
File "/root/projects/sms_wsj/sms_wsj/kaldi/get_kaldi_wer.py", line 303, in run
create_dir(
File "/root/projects/sms_wsj/sms_wsj/kaldi/get_kaldi_wer.py", line 66, in create_dir
create_data_dir_from_audio_dir(
File "/root/projects/sms_wsj/sms_wsj/kaldi/utils.py", line 170, in create_data_dir_from_audio_dir
_create_data_dir(
File "/root/projects/sms_wsj/sms_wsj/kaldi/utils.py", line 246, in _create_data_dir
example_id_to_wav[example_id] = get_wav_command_fn(
File "/root/projects/sms_wsj/sms_wsj/kaldi/utils.py", line 108, in _get_wav_command_for_audio_dir
assert audio_path.exists(), audio_path
AssertionError: /data/quancs/datasets/sms_wsj/early/test_eval92/0_4k6c0303_4k4c0319_0.wav
And, I used KALDI version mentioned in the README The script has been tested with the KALDI Git hash "7637de77e0a77bf280bef9bf484e4f37c4eb9475"
.
Decoding is over now. The WER is printed (7.02 for early/test_eval92). Is this WER correct as it doesn't match any one in your paper? [...] And, I used KALDI version mentioned in the README [...]
I don't know the variance of the performance, when training the ASR model. I never trained it myself.
Maybe you were lucky and got a good seed. The reference number is 7.34 % WER in the paper.
But it looks a bit too much.
And, it also reported an error (maybe it doesn't matter).
The error happened, because the code tried to decode also "cv_dev93".
I will check the commands in the file and fix them. Strangely, nobody reported that they don't work.
I don't know the variance of the performance, when training the ASR model. I never trained it myself. Maybe you were lucky and got a good seed. The reference number is 7.34 % WER in the paper. But it looks a bit too much.
WER = 6.85 for speech_source/test_eval92/
. This is slightly worse than the one reported in paper, i.e. 6.8.
I guess the ASR is configured to be trained on train_baseline_asr.py
.
The train_baseline_asr.py
script has the option train_data_type
. The default sms_single_speaker
is the x_d+n_d
. You can change it to speech_source
or original_source
, where original_source
is the original WSJ file and speech_source
the padded WSJ file.
When you change train_data_type
, you should also change ali_data_type
to the same value. In case of original_source
you have to change, and for speech_source
it's recommended.
OK. Thank you again for your help and the creation of this dataset! ^_^