/ISMIR2004-for-MCLNN

Primary LanguageBatchfileMIT LicenseMIT

license

ISMIR2004 dataset for MCLNN

The ISMIR2004 music genre dataset. The clips are provided from here. If you could not access it, you could use the dataset is hosted here , but it requires custom handling to extract clips that overlap with the original ISMIR2004 dataset.

Clip Duration Format Count Categories
Full recordings .mp3 1458 6

Dataset Summary:

  • A 729-training/729-testing split is defined for the dataset, we combined both splits for a 10-folds cross-validation.

This folder contains:

  • Scripts required to prepare the ISMIR2004 dataset for the MCLNN processing.
  • Pretrained weights and indices for the 10-fold cross-validation in addition to the standardization parameters to replicate the results in:

Fady Medhat, David Chesmore and John Robinson, "Masked Conditional Neural Networks for Audio Classification," International Conference on Artificial Neural Networks and Machine Learning (ICANN)

Prepossessing

The preprocessing involved in preparing the ISMIR2004 dataset is resampling to .wav at 22050 Hz.

Preparation scripts prerequisites

The preparation scripts require the following packages to be installed beforehand:

  • ffmpeg version N-81489-ga37e6dd
  • numpy 1.11.2+mkl
  • librosa 0.4.0
  • h5py 2.6.0

Steps

  1. Download the dataset and execute the scripts in the preparation scripts following the order of their labels.
  2. Make sure the files are ordered following the ISMIR2004_storage_ordering file.
  3. Configure the spectrogram transformation within the Dataset Transformer and generate the MCLNN-Ready hdf5 for the dataset.
  4. Generate the indices for the folds using the Index Generator script.