Multi-pitch Streaming of Vocal Quartets

This repository is currently under preparation. Essential updates are reported below..

This is the accompanying repository of Chapter 5 of the PhD dissertation:

Helena Cuesta. Data-driven Pitch Content Description of Choral Singing Recordings. Submitted, 2022. Universitat Pompeu Fabra, Barcelona.

Figure 1: Example U-Net-Harm outputs.

Description

The proposed multi-pitch streaming models convert an input audio recording of a four-part vocal ensemble into four independent pitch contours, one for each melodic voice.

As described in the documentation, there are three main models available, although we strongly recommend the use of U-Net-Harm (unet_harm, set by default) as it obtains the best performances in our experiments.

Prediction

Here's how to call the prediction script from the command line:

python predict_on_audio.py --model unet_harm --output output_dir --audiofile input_mixture.wav

The model argument can be unet_harm, unet_stand, unet_harm_noskip, unet_harm_hcqt, each referring to a different model variant as described in the reference.
The output argument denotes the output directory to store the results (salience functions and F0 contours). One CSV file for each output F0 trajectory and one NPY file with each output salience will be stored in this directory inside a folder with the model's name.
The audiofile argument should be the full path to the input audio file.

The code also allows predicting the output of multiple files, which should all be in the same folder indicated using the parameter audiofolder instead of audiofile.

Current status of the repo

Feb 15th 2022: the models are currently not available. While we update this issue, we kindly ask you to download the desired model(s) from this link and place then in a folder named models/ in the root repo folder before running the predict_on_audio.py script.

helenacuesta/multipitch-streaming-vocals

Multi-pitch Streaming of Vocal Quartets

Description

Prediction

Current status of the repo