This repository contains a 10-fold cross-validation split of the Extented Ballroom Dataset. This split was created using a Python script and supplied metadata from the dataset website. It complies to the following rules:
- Applies an artist filter: making sure that, for a given fold, no songs from the same artist are on the training and the test sets at the sime time.
- Applies a production filter, making sure that, for a given fold, no songs from the same disc are on the training and the test sets at the same time.
This is not a stratified split. The number of samples for some classes is simply too small, thus a stratified split would result in too few folds. Plus, we wanted to play around with non-stratified datasets, thus the split.
These files contain the lists of the files that should be used to train the predictive models. The format is the following:
song_file\tgenre\n
These files contain the lists of files that should be used to test the predictive models. The format is the following:
song_file\n
These files contain the exact same list of files of the correspondent test file, except that they also contain the ground-truth:
song_file\tgenre\n
If you use this split, please cite this page!
Cheers!
Juliano
Lecturer @ Universidade Tecnologica Federal do Parana (Campo Mourao, PR, Brazil)
PhD Candidate @ UNICAMP (Campinas, SP, Brazil)
julianofoleiss at utfpr dot edu dot br