A Pytorch implementation of Google's VoiceFilter System
-
Data simulation
./nnet/data_simulate.py --dump-dir simu/train /path/to/librispeech/train.scp asset/train_tuples.csv ./nnet/data_simulate.py --dump-dir simu/dev /path/to/librispeech/dev.scp asset/dev_tuples.csv
-
Speaker embedding (I used public xvector from here)
-
Data prepare
Prepare data as
{mix,ref,emb}.scp
and the format of scp file follows Kaldi's recipe, egs<key> <path>
pair in each line. -
Confugure
nnet/conf.py
and train the model (seetrain.sh
for details). -
Use
nnet/separate.py
for inference.
- I used Si-SNR loss instead of MSE of spectrogram, which could achieve better perfermance.