Sound Source Separation
Tools for multi-channel sound source separation and dereverberation.
News
- Ver2.1 is released. Source separation methods are implemented with Pytorch (numpy and cupy are not necessary)
- Other methods implemented at ver1.0 such as MNMF-DP and FastMNMF-DP will be added in the future.
Method list
Source separation
- FastMNMF1
- FastMNMF2
- ILRMA
- MNMF (Pytorch version is much slower than cupy version on GPU)
Joint source separation and dereverberation
- AR-FastMNMF2 (Pytorch version is not ready)
Requirements
- Tested on Python3.8
- Requirements for numpy and cupy version in
src
are listed below
numpy (1.19.2 was tested)
librosa
pysoundfile
tqdm
# optional packages
cupy # for GPU accelaration (9.4.0 was tested)
h5py # for saving the estimated parameters
You can install all the packages above with pip install -r src/requirements.txt
- Requirements for pytorch version in
src_torch
are listed below
torch
torchaudio
tqdm
# optional packages
h5py # for saving the estimated parameters
You can install all the packages above with pip install -r src_torch/requirements.txt
Usage
python3 FastMNMF2.py [input_filename] --gpu [gpu_id]
- Input is the multichannel observed signals.
- If gpu_id < 0, CPU is used, and cupy is not required.
Citation
If you use the code of FastMNMF1 or FastMNMF2 in your research project, please cite the following paper:
- Kouhei Sekiguchi, Aditya Arie Nugraha, Yoshiaki Bando, Kazuyoshi Yoshii:
Fast Multichannel Source Separation Based on Jointly Diagonalizable Spatial Covariance Matrices,
European Signal Processing Conference (EUSIPCO), 2019 - Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Kazuyoshi Yoshii, Tatsuya Kawahara:
Fast Multichannel Nonnegative Matrix Factorization with Directivity-Aware Jointly-Diagonalizable Spatial Covariance Matrices for Blind Source Separation,
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020.
If you use the code of AR-FastMNMF2 in your research project, please cite the following paper:
- Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Mathieu Fontaine, Kazuyoshi Yoshii: Autoregressive Fast Multichannel Nonnegative Matrix Factorization for Joint Blind Source Separation and Dereverberation, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021.
Detail
- "n_bit" argument means the number of bits, and is set to 32 or 64 (32-> float32 and complex64, 64->float64 and complex128, default is 64). n_bit=32 reduces computational cost and memory usage in exchange for the separation performance. Especially when the number of microphones (or tap length in AR-based methods like AR-FastMNMF2) is large, the performance is likely to degrade. Moreover, when you are using simulated signals without reverberation, since the mixture SCM is likely to be rank-deficient, please add small noise to the simulated signals. In MNMF.py, only n_bit=64 is available.