/Music-Source-Separation-Training

Repository for training models for music source separation.

Primary LanguagePython

Music Source Separation Universal Training Code

Repository for training models for music source separation. Repository is based on kuielab code for SDX23 challenge. The main idea of this repository is to create training code, which is easy to modify for experiments. Brought to you by MVSep.com.

Models

Model can be chosen with --model_type arg.

Available models for training:

Note 1: For segm_models there are many different encoders is possible. Look here.

Note 2: Thanks to @lucidrains for recreating the RoFormer models based on papers.

How to train

To train model you need to:

  1. Choose model type with key --model_type. Possible values: mdx23c, htdemucs, segm_models, mel_band_roformer, bs_roformer.
  2. Choose location of config for model --config_path <config path>. You can find examples of configs in configs folder. Prefixes config_musdb18_ are examples for MUSDB18 dataset.
  3. If you have some check-point from the same model or from the similar model you can use it with: --start_check_point <weights path>
  4. Choose path where to store results of training --results_path <results folder path>

Example

python train.py \ 
    --model_type mel_band_roformer \ 
    --config_path configs/config_mel_band_roformer_vocals.yaml \
    --start_check_point results/model.ckpt \
    --results_path results/ \
    --data_path 'datasets/dataset1' 'datasets/dataset2' \
    --valid_path datasets/musdb18hq/test \
    --num_workers 4 \
    --device_ids 0

All available training parameters you can find here.

How to inference

Example

python inference.py \  
    --model_type mdx23c \
    --config_path configs/config_mdx23c_musdb18.yaml \
    --start_check_point results/last_mdx23c.ckpt \
    --input_folder input/wavs/ \
    --store_dir separation_results/

All available inference parameters you can find here.

Useful notes

  • All batch sizes in config are adjusted to use with single NVIDIA A6000 48GB. If you have less memory please adjust correspodningly in model config training.batch_size and training.gradient_accumulation_steps.
  • It's usually always better to start with old weights even if shapes not fully match. Code supports loading weights for not fully same models (but it must have the same architecture). Training will be much faster.

Code description

  • configs/config_*.yaml - configuration files for models
  • models/* - set of available models for training and inference
  • dataset.py - dataset which creates new samples for training
  • inference.py - process folder with music files and separate them
  • train.py - main training code
  • utils.py - common functions used by train/valid
  • valid.py - validation of model with metrics

Pre-trained models

If you trained some good models, please, share them. You can post config and model weights in this issue.

Model Type Instruments Metrics Config Checkpoint
MDX23C vocals / other SDR vocals: 10.17 Config Weights
HT Demucs vocals / other SDR vocals: 8.78 Config Weights
Segm Models (VitLarge23) vocals / other SDR vocals: 9.77 Config Weights
Mel Band RoFormer vocals (*) / other SDR vocals: 8.42 Config Weights

Dataset types

Look here: Dataset types

Citation

@misc{solovyev2023benchmarks,
      title={Benchmarks and leaderboards for sound demixing tasks}, 
      author={Roman Solovyev and Alexander Stempkovskiy and Tatiana Habruseva},
      year={2023},
      eprint={2305.07489},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}