ESC-50 Dataset for Environmental Sound Classification is a tagged collection of 2000 recordings of environmental sound that is appropriate for benchmarking environmental sound categorization techniques.
It contains 50 semantic classes with 40 examples each and 5 major categories:
- Animals
- Natural soundscapes & water sounds
- Human, non-speech sounds
- Interior/domestic sounds
- Exterior/urban noises
This dataset can be downloaded as a .zip file: ESC-50 dataset
To perform Audio classification, we first preprocess the data to extract the audio signal's relevant features using MFCC and then pass those important features through the deep neural network for the audio classification. The Mel Frequency Cepstral Coefficients (MFCCs) are short term spectral features of a signal which concisely describe the overall shape of a spectral envelope. Few MFCCs extracted from ESC-50 dataset:CNNs or convolutional neural nets are a type of deep learning algorithm that does really well at learning images. To use them for Audio classification we extract features which look like images and shape them in a way in order to feed them into a CNN. We use the librosa package to do the same.
Recurrent Neural nets are a type of deep learning algorithm that can remember sequences. Audio data tends to follow a pattern which can be exploited using RNNs to classify them. In contrast to the CNN model's results we decide to use a stateful LSTM thats allows us to simplify the overall network structure. All we need here is the LSTM layer followed by a Dense layer.
Made with ☕ and ❤️