Unofficial implementation of Dual-Path Transformer Network (DPTNet) for speech separation (Interspeech 2020)
- Data pre-processing
- Training
- Inference
- Separate
-- in-dir: It means your WSJ2mix dataset directory. (It has tr/cv/tt folders)
-- out-dir: It saves json files(file information) (recommand not to change)
$ python preprocess.py --in-dir /data/min --out-dir data --sample-rate 8000
$ python train.py
If you change --out-dir option, you have to set --train_dir '{your_directory}/tr' --valid_dir '{your_directory}/cv'
$ python train.py --train_dir 'data/tr' --valid_dir 'data/cv'
$ python evaluate.py --model_path 'exp/temp/temp_best.pth.tar'
If you change --out-dir option, you have to set --data_dir '{your_directory}/tt'
$ python evaluate.py --data_dir 'data/tt' --model_path 'exp/temp/temp_best.pth.tar'
$ python separate.py --model_path 'exp/temp/temp_best.pth.tar'
If you change --out-dir option, you have to set --mix_json '{your_directory}/tt/mix.json'
$ python evaluate.py --mix_json 'data/tt/mix.json' --model_path 'exp/temp/temp_best.pth.tar'
We achive SI-SNRi 19.84dB when L=4 (encoder kernel length)
You can check the separated audio samples in the result directory.