/SCNet-PyTorch

Unofficial PyTorch implementation of "SCNet: Sparse Compression Network for Music Source Separation"

Primary LanguagePythonMIT LicenseMIT

SCNet-Pytorch

Unofficial PyTorch implementation of the paper "SCNet: Sparse Compression Network for Music Source Separation".

architecture


Table of Contents

  1. Changelog & ToDo's
  2. Dependencies
  3. Training
  4. Inference
  5. Evaluation
  6. Repository structure
  7. Citing

  • 10.02.2024
    • Model itself is finished. The train script is on its way.
  • 21.02.2024
    • Add part of the training pipeline.
  • 02.03.2024
    • Finish the training pipeline and the separator.
  • 17.03.2024
    • Finish inference.py and fill README.md
  • 13.04.2024
    • Finish evaluation pipeline and fill README.md.

ToDo's

  • Add trained model.

Before starting training, you need to install the requirements:

pip install -r requirements.txt

Then, download the MUSDB18HQ dataset:

wget -P /path/to/dataset/musdb18hq.zip https://zenodo.org/records/3338373/files/musdb18hq.zip 
unzip /path/to/dataset/musdb18hq.zip -d /path/to/dataset

Next, create environment variables with paths to the audio data and generated metadata .pqt file:

export DATASET_DIR=/path/to/dataset/musdb18hq
export DATASET_PATH=/path/to/dataset/dataset.pqt

Finally, export the GPU to make it visible:

export CUDA_VISIBLE_DEVICES=0

Now, you can train the model.


To train the model, a combination of PyTorch-Lightning and hydra was used. All configuration files are stored in the src/conf directory in hydra-friendly format.

To start training a model with given configurations, use the following script:

python src/train.py

To configure the training process, follow hydra instructions. You can modify/override the arguments doing something like this:

python src/train.py +trainer.overfit_batches=10 loader.train.batch_size=16

After training is started, the logging folder will be created for a particular experiment with the following path:

logs/scnet/${now:%Y-%m-%d}_${now:%H-%M}/

This folder will have the following structure:

├── checkpoints
│   └── tensorboard_log_file    - main tensorboard log file 
├── tensorboard
│   └── *.ckpt                  - lightning model checkpoint files.
└── yamls
│   └──*.yaml                   - hydra configuration and override files 
└── train.log                   - logging file for train.py
   

After training a model, you can run inference using the following command:

python src/inference.py -i <INPUT_PATH> \ 
                        -o <OUTPUT_DIR> \
                        -c <CHECKPOINT_PATH>

This command will generate separated audio files in .wav format in the <OUTPUT_DIR> directory.

For more information about the script and its options, use:

usage: inference.py [-h] -i INPUT_PATH -o OUTPUT_PATH -c CKPT_PATH [-d DEVICE] [-b BATCH_SIZE] [-w WINDOW_SIZE] [-s STEP_SIZE] [-p]

Argument Parser for Separator

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_PATH, --input-path INPUT_PATH
                        Input path to .wav audio file/directory containing audio files
  -o OUTPUT_PATH, --output-path OUTPUT_PATH
                        Output directory to save separated audio files in .wav format
  -c CKPT_PATH, --ckpt-path CKPT_PATH
                        Path to the model checkpoint
  -d DEVICE, --device DEVICE
                        Device to run the model on (default: cuda)
  -b BATCH_SIZE, --batch-size BATCH_SIZE
                        Batch size for processing (default: 4)
  -w WINDOW_SIZE, --window-size WINDOW_SIZE
                        Window size (default: 11)
  -s STEP_SIZE, --step-size STEP_SIZE
                        Step size (default: 5.5)
  -p, --use-progress-bar
                        Use progress bar (default: True)

Additionally, you can run inference within Python using the following script:

import sys
sys.path.append('src/')

import torchaudio
from src.model.separator import Separator

device: str = 'cuda'

separator = Separator.load_from_checkpoint(
    path="<CHECKPOINT_PATH>",   # path to trained Lightning checkpoint
    batch_size=4,         # adjust batch size to fit into your GPU's memory
    window_size=11,       # window size of the model (do not change)
    step_size=5.5,        # as step size is closer to window size, inference will be faster, but results less good
    use_progress_bar=True # show progress bar per audio file
).to(device)

y, sr = torchaudio.load("<INPUT_PATH>")
y = y.to(device)

y_separated = separator.separate(y).cpu()

Make sure to replace <INPUT_PATH>, <OUTPUT_DIR>, and <CHECKPOINT_PATH> with the appropriate paths for your setup.


After training a model, you can run evaluation pipeline using the following command:

python src/evaluate.py -c <CHECKPOINT_PATH>

This script uses defined checkpoint path <CHECKPOINT_PATH> and runs inference on test audio files from DATASET_PATH file.

As a result, the script will output into console mean SDRs for each source.

For more information about the script and its options, use:

usage: evaluate.py [-h] -c CKPT_PATH [-d DEVICE] [-b BATCH_SIZE] [-w WINDOW_SIZE] [-s STEP_SIZE]

Argument Parser for Separator

optional arguments:
  -h, --help            show this help message and exit
  -c CKPT_PATH, --ckpt-path CKPT_PATH
                        Path to the model checkpoint
  -d DEVICE, --device DEVICE
                        Device to run the model on (default: cuda)
  -b BATCH_SIZE, --batch-size BATCH_SIZE
                        Batch size for processing (default: 4)
  -w WINDOW_SIZE, --window-size WINDOW_SIZE
                        Window size (default: 11)
  -s STEP_SIZE, --step-size STEP_SIZE
                        Step size (default: 5.5)

To cite this paper, please use:

@misc{tong2024scnet,
      title={SCNet: Sparse Compression Network for Music Source Separation}, 
      author={Weinan Tong and Jiaxu Zhu and Jun Chen and Shiyin Kang and Tao Jiang and Yang Li and Zhiyong Wu and Helen Meng},
      year={2024},
      eprint={2401.13276},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}