/lakh_vocal_segments_dataset

singing voice with annotations of vocal onsets, based on the matched MIDI from http://colinraffel.com/projects/lmd/

Primary LanguageJupyter Notebook

lakh vocal segments dataset

This is a dataset of multi-instrumental recordings of pop songs (in English) with annotations transcription of singing voice, based on the MIDI matched from the lakh dataset. Created to provide real-world material for singing vocie transciption with diverse genres and singers.

Possible use tasks:

  • vocal onset detection
  • transcription of singing voice into notes
  • beat detection

Note that it is not suitable yet for offset detection as we did not validate the correctness of the length of aligned MIDI notes

Folder structure

  • list_MSD_ids: list of the songs in the dataset
  • scripts: python scripts for loading data, more scripts are in the similar repository
  • data: audio files. excerpt.txt gives begining and ending timestamp of the 7-digital exceprt from the complete recording. Determined manually.
  • experiments: ralated to the paper - Georgi Dzhambazov, André Holzapfel, Ajay Srinivasamurthy, Xavier Serra, Metrical-Accent Aware Vocal Onset Detection in Polyphonic Audio, In Proceedings of ISMIR 2017

Criteria for inclusion in the dataset:

  • songs from the datasets used in MIREX Automatic_Lyrics-to-Audio_Alignment full-listed also here. Note that this gives initial list of songs for cross-reference, but could be extended with any other songs.
  • has a linked MIDI in the lakh dataset
  • has predominant singing voice present in the 30-seconds thumbnail
  • has some clear metrical pulsation and the meter is 4/4

Steps to derive annotations

  1. find recording MSD_TRACK_id from this list. Then match by this script
  • Then derive its beginning and ending timestamp and create data/MSD_TRACK_id/exceprt.txt manually.
  1. get the matched MIDI from lakh-matched MIDI fetch_midi (if more than one match, pick the MIDI for the best match)

  2. derive singing voice note annotations by this script Since MIDI standard does not define an instrument for singing voice, the singing voice track is given a different program # in a random channel in each MIDI. Thus one needs to manually identify the MIDI channel # that corresponds to the melody of the singing voice track Optionally, doing in advance an annotation of segments with active vocal is helpful.

  3. derive beat annotations by this script

  4. verify annotations of note onsets and beats. Correct manually some imprecise vocal annotations. Open as note layer in Sonic Visualiser by script 'sh open_in_sv.sh'

  • listen in slower motion

  • if systematic delay/advance of timestamps, measure the difference to onsets with SV's measure tool and run shift time of annotation

  • put the audio for MSD_TRACK_id in data/MSD_TRACK_id:

cp /Volumes/datasets/MTG/audio/incoming/millionsong-audio/mp3/D/W/U/$track_ID data/

Scratch notes and observations on songs so far...

observations for excluded tracks :

NELLY FORTADO MSD version has no voice I kissed a girl is in 2/4, all the rest is 4/4 viva la vida not in MSD CLOCKS has almost no voice in MSD rehab has no vocal channel in MIDI

observations for included tracks

smells like has a lot of same-pitch onsets call me - the onsets are on offbeats mostly bangles and sunrise have no percussive instruments

Citation

Georgi Dzhambazov, André Holzapfel, Ajay Srinivasamurthy, Xavier Serra, Metrical-Accent Aware Vocal Onset Detection in Polyphonic Audio, In Proceedings of ISMIR 2017

Contact

georgi (dot) dzhambazov (at) upf (dot) edu

License

The license for annotations follows the license of lakh dataset. The audio comes form the MSD and is shared here for the purpose of crowd-annotation. However, we will remove it once we release the dataset.