Thanks Keisuke Kinoshita for helping me to solve problems.
A PyTorch implementation of the TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation and Improving noise robust automatic speech recognition with single-channel time-domain enhancement network
see requirements.txt
./nnet/separate.py /path/to/checkpoint --input /path/to/mix.scp --gpu 0 > separate.log 2>&1 &
- evaluate
./nnet/compute_si_snr.py /path/to/ref_spk1.scp,/path/to/ref_spk2.scp /path/to/inf_spk1.scp,/path/to/inf_spk2.scp
- file format
The ".scp" file is kaldi's script file, its content include UUID and file path. Like this:
uuid1 /path/to/file1
uuid2 /path/to/file2
mix.scp
: Mixture multiple speaker speech from skp1.scp
, skp2.scp
... and spk$N.scp
.
...
Luo Y, Mesgarani N. TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation[J]. arXiv preprint arXiv:1809.07454, 2018.
Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani. Improving noise robust automatic speech recognition with single-channel time-domain enhancement network. arXiv preprint arXiv:2003.03998, 2020.