The UrbanSound8k environmental sound dataset.
Clip Duration | Format | Count | Categories |
---|---|---|---|
max 4 secs | .wav | 8732 | 10 |
Dataset Summary:
- clips have a maximum of 4-seconds length with different sampling rates.
- dataset is released into predefined 10-fold splits for cross-validation.
This folder contains:
-
Scripts required to prepare the UrbanSound8k dataset for the MCLNN processing.
-
Pretrained weights and indices for the 10-fold cross-validation in addition to the standardization parameters to replicate the results in:
Fady Medhat, David Chesmore and John Robinson, "Recognition of Acoustic Events Using Masked Conditional Neural Networks," 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)
The following are the steps involved in preparing the UrbanSound8k dataset:
- Unify the samplying rate.
- Clone and concatenate each sample to make its length at least equal to 4 seconds.
- Redistribute the 10-folds folders to a folder per category.
The preparation scripts require the following packages to be installed beforehand:
- ffmpeg version N-81489-ga37e6dd
- numpy 1.11.2+mkl
- librosa 0.4.0
- h5py 2.6.0
- Download the dataset and execute the scripts in the preparation scripts following the order of their labels.
- Make sure the files are ordered following the UrbanSound8K_storage_ordering file.
- Configure the spectrogram transformation within the Dataset Transformer and generate the MCLNN-Ready hdf5 for the dataset using the Urbansound8k_MCLNN.csv file.
- Generate the indices for the folds using the Index Generator script.