A Vocoder Based Method For Singing Voice Extraction

Pritish Chandna, Merlijn Blaauw, Jordi Bonada, Emilia Gómez

Music Technology Group, Universitat Pompeu Fabra, Barcelona

This repository contains the source code for the paper with the same title. Please note that the model presented here is currently configured just for the iKala dataset, as published in the corresponding paper, but can also be used for other commerical songs. For examples of the output of the system, please visit: https://pc2752.github.io/singing_voice_sep/

Installation

To install, clone the repository and use
pip install requirements.txt 
to install the packages required.

The main code is in the train_tf.py file. To use the file, you will have to download the model weights and place it in the log_dir_m1 directory, defined in config.py. Wave files to be tested should be placed in the wav_dir, as defined in config.py. You will also require TensorFlow to be installed on the machine.

Data pre-processing

Once the iKala files have been put in the wav_dir, you can run

python prep_data_ikala.py
to carry out the data pre-processing step.

Training and inference

Once setup, you can run the command

python main.py -t
to train or
python main.py -e <filename>
to synthesize the output from an hdf5 file or
python main.py -v <filename>
for a .wav file. The output will be saved in the val_dir specified in the config.py file. The plots show the ground truth and output values for the vocoder features as well as the f0 and the accuracy. Note that plots are only supported for iKala songs as the ground truth is available for these songs.

We are currently working on future applications for the methodology and the rest of the files in the repository are for this purpose, please ignore. We will further update the repository in the coming months.

Acknowledgments

The TITANX used for this research was donated by the NVIDIA Corporation. This work is partially supported by the Towards Richer Online Music Public-domain Archives (TROMPA) project.