Music source separation is a task to separate audio recordings into individual sources. This repository is an PyTorch implmementation of music source separation. Users can separate their favorite songs into different sources by installing this repository. In addition, users can train their own music source separation systems using this repository. This repository also includes speech enhancement, instruments separation, etc.
Vocals and accompaniment separation: https://www.youtube.com/watch?v=WH4m5HYzHsg
pip install bytesep
Users can easily separate their favorite audio recordings into vocals or accompaniment using the pretrained checkpoints. The checkpoints are trained using only the training subset (100 songs) of the Musdb18 dataset.
python3 separate_scripts/separate.py \
--audio_path="./resources/vocals_accompaniment_10s.mp3" \
--source_type="accompaniment" # "vocals" | "accompaniment"
Download checkpoints.
./separate_scripts/download_checkpoints.sh
Do separation.
./separate_scripts/separate_vocals.sh "resources/vocals_accompaniment_10s.mp3" "sep_vocals.mp3"
./separate_scripts/separate_accompaniment.sh "resources/vocals_accompaniment_10s.mp3" "sep_accompaniment.mp3"
We use the MUSDB18 dataset to train music source separation systems. The trained system can be used to separate vocals, accompaniments, bass, and other sources. Execute the following script to download and decompress the MUSDB18 dataset:
./scripts/0_download_datasets/musdb18.sh
The dataset looks like:
./datasets/musdb18 ├── train (100 files) │ ├── 'A Classic Education - NightOwl.stem.mp4' │ └── ... ├── test (50 files) │ ├── 'Al James - Schoolboy Facination.stem.mp4' │ └── ... └── README.md
We pack audio waveforms into hdf5 files to speed up training.
."/scripts/1_pack_audios_to_hdf5s/musdb18/sr=44100,chn=2.sh"
./scripts/2_create_indexes/musdb18/create_indexes.sh
./scripts/3_create_evaluation_audios/musdb18/create_evaluation_audios.sh
./scripts/4_train/musdb18/train.sh
./scripts/5_inference/musdb18/inference.sh
##
Model | Size (MB) | SDR (dB) | process 1s time (GPU Tesla V100) | process 1s time (CPU Core i7) |
---|---|---|---|---|
ResUNet143 vocals | 461 | 8.9 | 0.036 | 2.513 |
ResUNet143 Subband vocals | 414 | 8.8 | 0.012 | 0.614 |
ResUNet143 acc. | 461 | 16.8 | 0.036 | 2.513 |
ResUNet143 Subband acc. | 414 | 16.4 | 0.012 | 0.614 |
[1] Qiuqiang Kong, Yin Cao, Haohe Liu, Keunwoo Choi, Yuxuan Wang, Decoupling Magnitude and Phase Estimation with Deep ResUNet for Music Source Separation, International Society for Music Information Retrieval (ISMIR), 2021.
@inproceedings{kong2021decoupling,
title={Decoupling Magnitude and Phase Estimation with Deep ResUNet for Music Source Separation.},
author={Kong, Qiuqiang and Cao, Yin and Liu, Haohe and Choi, Keunwoo and Wang, Yuxuan },
booktitle={ISMIR},
year={2021},
organization={Citeseer}
}
Other open sourced music source separation projects include but not limited to:
Subband ResUNet: https://github.com/haoheliu/Subband-Music-Separation
Demucs: https://github.com/facebookresearch/demucs
Spleeter: https://github.com/deezer/spleeter
Asteroid: https://github.com/asteroid-team/asteroid
Open-Unmix: https://github.com/sigsep/open-unmix-pytorch
Others