How to decode the separated audios

Question

How to decode the separated audios

Closed this issue 2 years ago · 15 comments

Hello. I have trained the asr model with sms_wsj.train_baseline_asr. I want to know is there any guidance to get the WER of the separated audios by using the trained asr model? Seems the sms_wsj/kaldi/get_kaldi_wer.py can do that. But I don't know how to prepare my separated results (e.g. dir structrure or other data) to meet the requirements of this script.

Thanks in advance.

boeddeker commented 2 years ago

Great.

Answer 1 · 2023-02-06T13:49:11.000Z

Hello,

in the beginning of the file sms_wsj/kaldi/get_kaldi_wer.py are several examples, how it can be used:
e.g.:

python -m sms_wsj.kaldi.get_kaldi_wer -F /EXP/DIR decode with kaldi_data_dir=/KALDI/DATA/DIR model_egs_dir=/MODEL/EGS/DIR dataset=test_eval92

where /EXP/DIR is the working/output dir, /KALDI/DATA/DIR a dir with "kaldi" data style, /MODEL/EGS/DIR the path to the trained model and test_eval92 is the dataset, i.e. a folder in /KALDI/DATA/DIR.

Answer 2 · 2023-02-06T14:19:09.000Z

Seems the dataset parameter is not valid as it is not defined and used in get_kaldi_wer.py

Answer 3 · 2023-02-06T14:42:07.000Z

And if I choose to follow the command python -m sms_wsj.kaldi.get_kaldi_wer -F /EXP/DIR with audio_dir=/AUDIO/DIR json_path=/JSON/PATH model_egs_dir=/MODEL/EGS/DIR, which json file should I provide?

Answer 4 · 2023-02-06T14:53:22.000Z

Seems the dataset parameter is not valid as it is not defined and used in get_kaldi_wer.py

Sorry, the signature was changed and no one checked the examples in the beginning.
For decode, kaldi_data_dir and dataset are replaced by dataset_dir.

And if I choose to follow the command python -m sms_wsj.kaldi.get_kaldi_wer -F /EXP/DIR with audio_dir=/AUDIO/DIR json_path=/JSON/PATH model_egs_dir=/MODEL/EGS/DIR, which json file should I provide?

The json_path is the path to the sms_wsj.json. In the {audio_dir}/[cv_dev93|test_eval92] the code will search for e.g. {id}_{spk}.wav, where id is the example_id and spk is 0 or 1 (Can be changed with id_to_file_name, but requires proper escaping for the shell.).

Answer 5 · 2023-02-06T14:59:45.000Z

@boeddeker Thank you for your answering. I will try. ^^

Answer 6 · 2023-02-06T15:12:57.000Z

Hello, I'm back. I tried: python -m sms_wsj.kaldi.get_kaldi_wer -F exp with audio_dir=/data/quancs/datasets/sms_wsj/early/test_eval92/ json_path=/data/quancs/datasets/sms_wsj/sms_wsj.json model_egs_dir=$KALDI_ROOT/egs/sms_single_speaker/s5/, but it reported an error (below). I don't know if I miss something.

root@36b3ff17d8c5:~/projects/sms_wsj# python -m sms_wsj.kaldi.get_kaldi_wer -F exp with audio_dir=/data/quancs/datasets/sms_wsj/early/test_eval92/ json_path=/data/quancs/datasets/sms_wsj/sms_wsj.json model_egs_dir=$KALDI_ROOT/egs/sms_single_speaker/s5/
INFO - Kaldi array - Running command 'run'
INFO - Kaldi array - Started run with ID "1"
Create /root/projects/sms_wsj/exp directory
Create data dir for sms_enh/test_eval92 data
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/utt2spk is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/text is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/wav.scp is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/spk2gender is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/utt2dur is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/reco2dur is not in sorted order or not unique, sorting it
fix_data_dir.sh: kept all 2664 utterances.
fix_data_dir.sh: old files are kept in /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/.backup
steps/make_mfcc.sh --nj 48 --mfcc-config /root/projects/sms_wsj/exp/conf/mfcc_hires.conf --cmd run.pl /root/projects/sms_wsj/exp/data/sms_enh/test_eval92 /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/make_mfcc /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/mfcc
utils/validate_data_dir.sh: Successfully validated data-directory /root/projects/sms_wsj/exp/data/sms_enh/test_eval92
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
steps/make_mfcc.sh: Succeeded creating MFCC features for test_eval92
steps/compute_cmvn_stats.sh /root/projects/sms_wsj/exp/data/sms_enh/test_eval92 /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/make_mfcc /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/mfcc
Succeeded creating CMVN stats for test_eval92
fix_data_dir.sh: kept all 2664 utterances.
fix_data_dir.sh: old files are kept in /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/.backup
Directory /root/projects/sms_wsj/exp/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92 not found, estimating ivectors
steps/online/nnet2/extract_ivectors_online.sh --cmd run.pl --nj 48 /root/projects/sms_wsj/exp/data/sms_enh/test_eval92 /root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/nnet3/extractor /root/projects/sms_wsj/exp/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92
utils/split_scp.pl: Refusing to split data because number of speakers 8 is less than the number of output .scp files 48
ERROR - Kaldi array - Failed after 0:00:15!
Traceback (most recent calls WITHOUT Sacred internals):
  File "/root/projects/sms_wsj/sms_wsj/kaldi/get_kaldi_wer.py", line 324, in run
    decode(
  File "/root/projects/sms_wsj/sms_wsj/kaldi/get_kaldi_wer.py", line 145, in decode
    ivector_dir = calculate_ivectors(
  File "/root/projects/sms_wsj/sms_wsj/kaldi/utils.py", line 372, in calculate_ivectors
    run_process([
  File "/root/projects/sms_wsj/sms_wsj/kaldi/utils.py", line 413, in run_process
    subprocess.run(
  File "/root/miniconda3/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['steps/online/nnet2/extract_ivectors_online.sh', '--cmd', 'run.pl', '--nj', '48', '/root/projects/sms_wsj/exp/data/sms_enh/test_eval92', '/root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/nnet3/extractor', '/root/projects/sms_wsj/exp/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92']' returned non-zero exit status 1.

Answer 7 · 2023-02-06T15:39:38.000Z

Kaldi complains, that --nj 48 (The number of jobs/workers) is too high. Kaldi cannot split speakers and fails if nj is too large.
In #24 I pushed a fix, so the new default is min(8, os.cpu_count()) instead of os.cpu_count().

Alternatively, you could also change the number of jobs on the command line with num_jobs=8.

Sorry, we didn't recognize this, because ran the code on machines with 8 cores.

Answer 8 · 2023-02-06T15:52:13.000Z

Nice! Decoding now. Great thanks for your patient help. 😀

Kaldi complains, that --nj 48 (The number of jobs/workers) is too high. Kaldi cannot split speakers and fails if nj is too large. In #24 I pushed a fix, so the new default is min(8, os.cpu_count()) instead of os.cpu_count().

Alternatively, you could also change the number of jobs on the command line with num_jobs=8.

Sorry, we didn't recognize this, because ran the code on machines with 8 cores.

Answer 9 · 2023-02-06T16:07:49.000Z

Decoding is over now. The WER is printed (7.02 for early/test_eval92). Is this WER correct as it doesn't match any one in your paper? And, it also reported an error (maybe it doesn't matter).

root@36b3ff17d8c5:~/projects/sms_wsj# python -m sms_wsj.kaldi.get_kaldi_wer -F exp3 with audio_dir=/data/quancs/datasets/sms_wsj/early/test_eval92/ json_path=/data/quancs/datasets/sms_wsj/sms_wsj.json model_egs_dir=$KALDI_ROOT/egs/sms_single_speaker/s5/
INFO - Kaldi array - Running command 'run'
INFO - Kaldi array - Started run with ID "1"
Create /root/projects/sms_wsj/exp3 directory
Create data dir for sms_enh/test_eval92 data
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/utt2spk is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/text is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/wav.scp is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/spk2gender is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/utt2dur is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/reco2dur is not in sorted order or not unique, sorting it
fix_data_dir.sh: kept all 2664 utterances.
fix_data_dir.sh: old files are kept in /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/.backup
steps/make_mfcc.sh --nj 8 --mfcc-config /root/projects/sms_wsj/exp3/conf/mfcc_hires.conf --cmd run.pl /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92 /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/make_mfcc /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/mfcc
utils/validate_data_dir.sh: Successfully validated data-directory /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
steps/make_mfcc.sh: Succeeded creating MFCC features for test_eval92
steps/compute_cmvn_stats.sh /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92 /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/make_mfcc /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/mfcc
Succeeded creating CMVN stats for test_eval92
fix_data_dir.sh: kept all 2664 utterances.
fix_data_dir.sh: old files are kept in /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/.backup
Directory /root/projects/sms_wsj/exp3/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92 not found, estimating ivectors
steps/online/nnet2/extract_ivectors_online.sh --cmd run.pl --nj 8 /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92 /root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/nnet3/extractor /root/projects/sms_wsj/exp3/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92
steps/online/nnet2/extract_ivectors_online.sh: extracting iVectors
steps/online/nnet2/extract_ivectors_online.sh: combining iVectors across jobs
steps/online/nnet2/extract_ivectors_online.sh: done extracting (online) iVectors to /root/projects/sms_wsj/exp3/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92 using the extractor in /root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/nnet3/extractor.
steps/nnet3/decode.sh --acwt 1.0 --post-decode-acwt 10.0 --extra-left-context 0 --extra-right-context 0 --extra-left-context-initial 0 --extra-right-context-final 0 --frames-per-chunk 140 --nj 8 --cmd run.pl --online-ivector-dir /root/projects/sms_wsj/exp3/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92 /root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/chain/tree_a_sp/graph_tgpr /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92 /root/projects/sms_wsj/exp3/exp/sms_single_speaker/tdnn1a_sp/decode_sms_enh_test_eval92
steps/nnet3/decode.sh: feature type is raw
steps/diagnostic/analyze_lats.sh --cmd run.pl --iter final /root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/chain/tree_a_sp/graph_tgpr /root/projects/sms_wsj/exp3/exp/sms_single_speaker/tdnn1a_sp/decode_sms_enh_test_eval92
steps/diagnostic/analyze_lats.sh: see stats in /root/projects/sms_wsj/exp3/exp/sms_single_speaker/tdnn1a_sp/decode_sms_enh_test_eval92/log/analyze_alignments.log
Overall, lattice depth (10,50,90-percentile)=(1,3,19) and mean=8.9
steps/diagnostic/analyze_lats.sh: see stats in /root/projects/sms_wsj/exp3/exp/sms_single_speaker/tdnn1a_sp/decode_sms_enh_test_eval92/log/analyze_lattice_depth_stats.log
score best paths
local/score.sh --cmd run.pl /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92 /root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/chain/tree_a_sp/graph_tgpr /root/projects/sms_wsj/exp3/exp/sms_single_speaker/tdnn1a_sp/decode_sms_enh_test_eval92
local/score.sh: scoring with word insertion penalty=0.0,0.5,1.0
score confidence and timing with sclite
Decoding done.
%WER 7.02 [ 3168 / 45144, 494 ins, 379 del, 2295 sub ] /root/projects/sms_wsj/exp3/exp/sms_single_speaker/tdnn1a_sp/decode_sms_enh_test_eval92/wer_9_1.0

Create data dir for sms_enh/cv_dev93 data
ERROR - Kaldi array - Failed after 0:05:40!
Traceback (most recent calls WITHOUT Sacred internals):
  File "/root/projects/sms_wsj/sms_wsj/kaldi/utils.py", line 105, in _get_wav_command_for_audio_dir
    assert audio_path.exists(), audio_path
AssertionError: /data/quancs/datasets/sms_wsj/early/test_eval92/cv_dev93/0_4k6c0303_4k4c0319_0.wav

During handling of the above exception, another exception occurred:

Traceback (most recent calls WITHOUT Sacred internals):
  File "/root/projects/sms_wsj/sms_wsj/kaldi/get_kaldi_wer.py", line 303, in run
    create_dir(
  File "/root/projects/sms_wsj/sms_wsj/kaldi/get_kaldi_wer.py", line 66, in create_dir
    create_data_dir_from_audio_dir(
  File "/root/projects/sms_wsj/sms_wsj/kaldi/utils.py", line 170, in create_data_dir_from_audio_dir
    _create_data_dir(
  File "/root/projects/sms_wsj/sms_wsj/kaldi/utils.py", line 246, in _create_data_dir
    example_id_to_wav[example_id] = get_wav_command_fn(
  File "/root/projects/sms_wsj/sms_wsj/kaldi/utils.py", line 108, in _get_wav_command_for_audio_dir
    assert audio_path.exists(), audio_path
AssertionError: /data/quancs/datasets/sms_wsj/early/test_eval92/0_4k6c0303_4k4c0319_0.wav

Answer 10 · 2023-02-06T16:28:12.000Z

And, I used KALDI version mentioned in the README The script has been tested with the KALDI Git hash "7637de77e0a77bf280bef9bf484e4f37c4eb9475".

Answer 11 · 2023-02-06T17:53:56.000Z

Decoding is over now. The WER is printed (7.02 for early/test_eval92). Is this WER correct as it doesn't match any one in your paper? [...] And, I used KALDI version mentioned in the README [...]

I don't know the variance of the performance, when training the ASR model. I never trained it myself.
Maybe you were lucky and got a good seed. The reference number is 7.34 % WER in the paper.
But it looks a bit too much.

And, it also reported an error (maybe it doesn't matter).

The error happened, because the code tried to decode also "cv_dev93".
I will check the commands in the file and fix them. Strangely, nobody reported that they don't work.

Answer 12 · 2023-02-06T18:55:14.000Z

I don't know the variance of the performance, when training the ASR model. I never trained it myself. Maybe you were lucky and got a good seed. The reference number is 7.34 % WER in the paper. But it looks a bit too much.

WER = 6.85 for speech_source/test_eval92/. This is slightly worse than the one reported in paper, i.e. 6.8.
I guess the ASR is configured to be trained on $x_d+n_d$ by default. So, I have another question: If I want to train the ASR model with direct-path signals (i.e. $s$ in paper), what parameters should I provide for train_baseline_asr.py.

Answer 13 · 2023-02-07T08:51:50.000Z

The train_baseline_asr.py script has the option train_data_type. The default sms_single_speaker is the x_d+n_d. You can change it to speech_source or original_source, where original_source is the original WSJ file and speech_source the padded WSJ file.
When you change train_data_type, you should also change ali_data_type to the same value. In case of original_source you have to change, and for speech_source it's recommended.

Answer 14 · 2023-02-07T13:23:00.000Z

OK. Thank you again for your help and the creation of this dataset! ^_^