/SoundSourceSeparation

The code for multi-channel source separation and dereverberation such as FastMNMF1, FastMNMF2, and AR-FastMNMF2.

Primary LanguagePythonOtherNOASSERTION

Sound Source Separation

Tools for multi-channel sound source separation and dereverberation.

News

Method list

Source separation

  • FastMNMF1
  • FastMNMF2
  • FastMNMF2_DP (DNN speech model + NMF noise model)
  • FastBSS2 (Frequency invariant / NMF / DNN speech model + NMF / Time invariant noise model)
    • This method includes FastMNMF2, FastMNMF2_DP, and so on
  • ILRMA
  • MNMF (Pytorch version is much slower than cupy version on GPU)

Joint source separation and dereverberation

  • AR-FastMNMF2 (Pytorch version is not ready)

Requirements

  • Tested on Python3.8
  • Requirements for numpy and cupy version in src are listed below
numpy (1.19.2 was tested)
librosa
pysoundfile
tqdm

# optional packages
cupy # for GPU accelaration (9.4.0 was tested)
h5py # for saving the estimated parameters
torch # for using DNN source model in FastBSS2.py or FastMNMF2_DP.py

You can install all the packages above with pip install -r src/requirements.txt

  • Requirements for pytorch version in src_torch are listed below
torch
torchaudio
tqdm

# optional packages
h5py # for saving the estimated parameters

You can install all the packages above with pip install -r src_torch/requirements.txt

Usage

python3 FastMNMF2.py [input_filename] --gpu [gpu_id]
  • Input is the multichannel observed signals.
  • If gpu_id < 0, CPU is used, and cupy is not required.

Citation

If you use the code of FastMNMF1 or FastMNMF2 in your research project, please cite the following paper:

If you use the code of AR-FastMNMF2 in your research project, please cite the following paper:

Detail

  • "n_bit" argument means the number of bits, and is set to 32 or 64 (32-> float32 and complex64, 64->float64 and complex128, default is 64). n_bit=32 reduces computational cost and memory usage in exchange for the separation performance. Especially when the number of microphones (or tap length in AR-based methods like AR-FastMNMF2) is large, the performance is likely to degrade. Moreover, when you are using simulated signals without reverberation, since the mixture SCM is likely to be rank-deficient, please add small noise to the simulated signals. In MNMF.py, only n_bit=64 is available.