/GTZAN-for-MCLNN

Primary LanguageBatchfileMIT LicenseMIT

license

GTZAN dataset for MCLNN

The GTZAN music genre dataset.

Clip Duration Format Count Categories
30 secs .au 1000 10

Dataset Summary:

  • clips are 10 seconds in length with 22050 Hz sampling rates.
  • No predefined split is defined for the dataset cross-validation.

This folder contains:

  • Scripts required to prepare the GTZAN dataset for the MCLNN processing.
  • Pretrained weights and indices for the 10-fold cross-validation in addition to the standardization parameters to replicate the results in:

Fady Medhat, David Chesmore and John Robinson, "Masked Conditional Neural Networks for Audio Classification," International Conference on Artificial Neural Networks and Machine Learning (ICANN)

Prepossessing

The preprocessing involved in preparing the GTZAN dataset is resampling to .wav at 22050 Hz.

Preparation scripts prerequisites

The preparation scripts require the following packages to be installed beforehand:

  • ffmpeg version N-81489-ga37e6dd
  • numpy 1.11.2+mkl
  • librosa 0.4.0
  • h5py 2.6.0

Steps

  1. Download the dataset and execute the scripts in the preparation scripts following the order of their labels.
  2. Make sure the files are ordered following the GTZAN_storage_ordering file.
  3. Configure the spectrogram transformation within the Dataset Transformer and generate the MCLNN-Ready hdf5 for the dataset.
  4. Generate the indices for the folds using the Index Generator script.