A PyTorch implementation of DNN-based source separation.
- v0.7.0
- Add new models (
MMDenseLSTM
,X-UMX
,HRNet
,SepFormer
). - Add pretrained models.
- Add new models (
Module | Reference | Done |
---|---|---|
Depthwise-separable convolution | ✔ | |
Gated Linear Units (GLU) | ✔ | |
Feature-wise Linear Modulation (FiLM) | FiLM: Visual Reasoning with a General Conditioning Layer | ✔ |
Point-wise Convolutional Modulation (PoCM) | LaSAFT: Latent Source Attentive Frequency Transformation for Conditioned Source Separation | ✔ |
Method | Reference | Done |
---|---|---|
Pemutation invariant training (PIT) | Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks | ✔ |
One-and-rest PIT | Recursive Speech Separation for Unknown Number of Speakers | ✔ |
Probabilistic PIT | Probabilistic Permutation Invariant Training for Speech Separation | |
Sinkhorn PIT | Towards Listening to 10 People Simultaneously: An Efficient Permutation Invariant Training of Audio Source Separation Using Sinkhorn's Algorithm | ✔ |
Combination Loss | All for One and One for All: Improving Music Separation by Bridging Networks | ✔ |
LibriSpeech example using Conv-TasNet
You can check other tutorials in <REPOSITORY_ROOT>/egs/tutorials/
.
cd <REPOSITORY_ROOT>/egs/tutorials/common/
. ./prepare_librispeech.sh --dataset_root <DATASET_DIR> --n_sources <#SPEAKERS>
cd <REPOSITORY_ROOT>/egs/tutorials/conv-tasnet/
. ./train.sh --exp_dir <OUTPUT_DIR>
If you want to resume training,
. ./train.sh --exp_dir <OUTPUT_DIR> --continue_from <MODEL_PATH>
cd <REPOSITORY_ROOT>/egs/tutorials/conv-tasnet/
. ./test.sh --exp_dir <OUTPUT_DIR>
cd <REPOSITORY_ROOT>/egs/tutorials/conv-tasnet/
. ./demo.sh
- MMDenseLSTM: See
egs/tutorials/mm-dense-lstm/separate_music.ipynb
or click . - Conv-TasNet: See
egs/tutorials/conv-tasnet/separate_music.ipynb
or click . - UMX: See
egs/tutorials/umx/separate_music.ipynb
or click . - X-UMX: See
egs/tutorials/x-umx/separate_music.ipynb
or click . - D3Net: See
egs/tutorials/d3net/separate_music.ipynb
or click .
You can load pretrained models like
from models.conv_tasnet import ConvTasNet
model = ConvTasNet.build_from_pretrained(task="musdb18", sample_rate=44100, target='vocals')
Model | Dataset | Example |
---|---|---|
LSTM-TasNet | WSJ0-2mix | model = LSTMTasNet.build_from_pretrained(task="wsj0-mix", sample_rate=8000, n_sources=2) |
Conv-TasNet | WSJ0-2mix | model = ConvTasNet.build_from_pretrained(task="wsj0-mix", sample_rate=8000, n_sources=2) |
Conv-TasNet | WSJ0-3mix | model = ConvTasNet.build_from_pretrained(task="wsj0-mix", sample_rate=8000, n_sources=3) |
Conv-TasNet | MUSDB18 | model = ConvTasNet.build_from_pretrained(task="musdb18", sample_rate=44100) |
Conv-TasNet | WHAM | model = ConvTasNet.build_from_pretrained(task="wham/separate-noisy", sample_rate=8000) |
Conv-TasNet | WHAM | model = ConvTasNet.build_from_pretrained(task="wham/enhance-single", sample_rate=8000) |
Conv-TasNet | WHAM | model = ConvTasNet.build_from_pretrained(task="wham/enhance-both", sample_rate=8000) |
Conv-TasNet | LibriSpeech | model = ConvTasNet.build_from_pretrained(task="librispeech", sample_rate=16000, n_sources=2) |
DPRNN-TasNet | WSJ0-2mix | model = DPRNNTasNet.build_from_pretrained(task="wsj0-mix", sample_rate=8000, n_sources=2) |
DPRNN-TasNet | WSJ0-3mix | model = DPRNNTasNet.build_from_pretrained(task="wsj0-mix", sample_rate=8000, n_sources=3) |
DPRNN-TasNet | LibriSpeech | model = DPRNNTasNet.build_from_pretrained(task="librispeech", sample_rate=16000, n_sources=2) |
MMDenseLSTM | MUSDB18 | model = MMDenseLSTM.build_from_pretrained(task="musdb18", sample_rate=44100, target="vocals") |
Open-Unmix | MUSDB18 | model = OpenUnmix.build_from_pretrained(task="musdb18", sample_rate=44100, target="vocals") |
Open-Unmix | MUSDB18-HQ | model = OpenUnmix.build_from_pretrained(task="musdb18hq", sample_rate=44100, target="vocals") |
DPTNet | WSJ0-2mix | model = DPTNet.build_from_pretrained(task="wsj0-mix", sample_rate=8000, n_sources=2) |
CrossNet-Open-Unmix | MUSDB18 | model = CrossNetOpenUnmix.build_from_pretrained(task="musdb18", sample_rate=44100) |
D3Net | MUSDB18 | model = D3Net.build_from_pretrained(task="musdb18", sample_rate=44100, target="vocals") |