GTZAN dataset for MCLNN

The GTZAN music genre dataset.

Clip Duration	Format	Count	Categories
30 secs	.au	1000	10

Dataset Summary:

clips are 10 seconds in length with 22050 Hz sampling rates.
No predefined split is defined for the dataset cross-validation.

This folder contains:

Scripts required to prepare the GTZAN dataset for the MCLNN processing.
Pretrained weights and indices for the 10-fold cross-validation in addition to the standardization parameters to replicate the results in:

Fady Medhat, David Chesmore and John Robinson, "Masked Conditional Neural Networks for Audio Classification," International Conference on Artificial Neural Networks and Machine Learning (ICANN)

Prepossessing

The preprocessing involved in preparing the GTZAN dataset is resampling to .wav at 22050 Hz.

Preparation scripts prerequisites

The preparation scripts require the following packages to be installed beforehand:

ffmpeg version N-81489-ga37e6dd
numpy 1.11.2+mkl
librosa 0.4.0
h5py 2.6.0

Steps

Download the dataset and execute the scripts in the preparation scripts following the order of their labels.
Make sure the files are ordered following the GTZAN_storage_ordering file.
Configure the spectrogram transformation within the Dataset Transformer and generate the MCLNN-Ready hdf5 for the dataset.
Generate the indices for the folds using the Index Generator script.

fadymedhat/GTZAN-for-MCLNN

GTZAN dataset for MCLNN

Prepossessing

Preparation scripts prerequisites

Steps