Environmental Sound Classification(ESC)

spectrogram and wave

Project Overview

The proposal evaluates the potential of convolutional neural networks in classifying short audio clips of environmental sounds. We trained the model and observed that existing dataset is insufficient to get the good accuracy, So we did Data Augmentation We added white noise to copy of existing dataset, so now we have 4000 training example. The 10% data is used to test the model and 10% of remaining data to evaluate the model. After augmentation we trained the model with 4 Conv2D, 5 relu activation, 4 MaxPooling2D, 1 softmax activation, 1 Dropout and 2 Dense layer. We used 30 epochs and achieved 95% of training accuracy and 91% of Validation accuracy.

Dataset

The ESC-50 dataset is a labeled collection of 2000 environmental audio recordings of environmental sound.

The dataset consists of 5-second-long recordings organized into 50 semantical classes (with 40 examples per class) loosely arranged into 5 major categories:

Animals Natural soundscapes & water sounds Human, non-speech sounds Interior/domestic sounds Exterior/urban noises
Dog Rain Crying baby Door knock Helicopter
Roosters Sea waves Sneezing Mouse click Chainsaw
Pig Crackling fire Clapping Keyboard typing Siren
Cow Crickets Breathing Door, wood creaks Car horn
Frog Chirping birds Coughing Can opening Engine
Cat Water drops Footsteps Washing machine Train
Hen Wind Laughing Vacuum cleaner Church bells
Insects (flying) Pouring water Brushing teeth Clock alarm Airplane
Sheep Toilet flush Snoring Clock tick Fireworks
Crow Thunderstorm Drinking, sipping Glass breaking Hand saw

Clips in this dataset have been manually extracted from public field recordings gathered by the Freesound.org project. The dataset has been prearranged into 5 folds for comparable cross-validation, making sure that fragments from the same original source file are contained in a single fold.

A more thorough description of the dataset is available in the original paper with some supplementary materials on GitHub: ESC: Dataset for Environmental Sound Classification - paper replication data.

Downloads

The dataset can be downloaded as a single .zip file (~600 MB):

Download ESC-50 dataset
Classes

Setup

Install these libraries to run the code.

  • re
  • cv2
  • os
  • numpy
  • pandas
  • librosa
  • matplotlib
  • tqdm
  • scipy
  • IPython
  • sklearn
  • tensorflow
  • keras

    Variables Discription

    audio_files - store all audio file and corresponding class.
    sampling_rate - which keep the flow of number of element per second of audio files.
    spectrogram - keep spectrogram of audio file.
    augmented_audio_files - store audio with white noise of original files and then combined with audio_files
    SPEC_H - spectrogram height.
    SPEC_W - spectrogram width.
    audio - one audio file.
    x - x is just a temporary variable for get audio file x_train after coverting into spectrogram assigned to the same x_train variable.

    Testing on Unseen data.

    spectrogram Waveform Audible

    model summary.

    Audible