/conv-tasnet

A PyTorch implementation of "Improving noise robust automatic speech recognition with single-channel time-domain enhancement network"

Primary LanguagePythonMIT LicenseMIT

Thanks Keisuke Kinoshita for helping me to solve problems.

ConvTasNet

A PyTorch implementation of the TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation and Improving noise robust automatic speech recognition with single-channel time-domain enhancement network

Requirements

see requirements.txt

Usage

./nnet/separate.py /path/to/checkpoint --input /path/to/mix.scp --gpu 0 > separate.log 2>&1 &
  • evaluate
./nnet/compute_si_snr.py /path/to/ref_spk1.scp,/path/to/ref_spk2.scp /path/to/inf_spk1.scp,/path/to/inf_spk2.scp
  • file format

The ".scp" file is kaldi's script file, its content include UUID and file path. Like this:

uuid1 /path/to/file1
uuid2 /path/to/file2

mix.scp: Mixture multiple speaker speech from skp1.scp, skp2.scp ... and spk$N.scp. ...

Reference

Luo Y, Mesgarani N. TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation[J]. arXiv preprint arXiv:1809.07454, 2018.

Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani. Improving noise robust automatic speech recognition with single-channel time-domain enhancement network. arXiv preprint arXiv:2003.03998, 2020.