Environmental Sound Classification(ESC)

Project Overview

The proposal evaluates the potential of convolutional neural networks in classifying short audio clips of environmental sounds. We trained the model and observed that existing dataset is insufficient to get the good accuracy, So we did Data Augmentation We added white noise to copy of existing dataset, so now we have 4000 training example. The 10% data is used to test the model and 10% of remaining data to evaluate the model. After augmentation we trained the model with 4 Conv2D, 5 relu activation, 4 MaxPooling2D, 1 softmax activation, 1 Dropout and 2 Dense layer. We used 30 epochs and achieved 95% of training accuracy and 91% of Validation accuracy.

Dataset

The ESC-50 dataset is a labeled collection of 2000 environmental audio recordings of environmental sound.

The dataset consists of 5-second-long recordings organized into 50 semantical classes (with 40 examples per class) loosely arranged into 5 major categories:

Animals	Natural soundscapes & water sounds	Human, non-speech sounds	Interior/domestic sounds	Exterior/urban noises
Dog	Rain	Crying baby	Door knock	Helicopter
Roosters	Sea waves	Sneezing	Mouse click	Chainsaw
Pig	Crackling fire	Clapping	Keyboard typing	Siren
Cow	Crickets	Breathing	Door, wood creaks	Car horn
Frog	Chirping birds	Coughing	Can opening	Engine
Cat	Water drops	Footsteps	Washing machine	Train
Hen	Wind	Laughing	Vacuum cleaner	Church bells
Insects (flying)	Pouring water	Brushing teeth	Clock alarm	Airplane
Sheep	Toilet flush	Snoring	Clock tick	Fireworks
Crow	Thunderstorm	Drinking, sipping	Glass breaking	Hand saw

Clips in this dataset have been manually extracted from public field recordings gathered by the Freesound.org project. The dataset has been prearranged into 5 folds for comparable cross-validation, making sure that fragments from the same original source file are contained in a single fold.

A more thorough description of the dataset is available in the original paper with some supplementary materials on GitHub: ESC: Dataset for Environmental Sound Classification - paper replication data.

Downloads

The dataset can be downloaded as a single .zip file (~600 MB):

Download ESC-50 dataset
Classes

Setup

Install these libraries to run the code.

cv2

numpy

pandas

librosa

matplotlib

tqdm

scipy

IPython

sklearn

tensorflow

keras

Variables Discription

audio_files - store all audio file and corresponding class.
sampling_rate - which keep the flow of number of element per second of audio files.
spectrogram - keep spectrogram of audio file.
augmented_audio_files - store audio with white noise of original files and then combined with audio_files
SPEC_H - spectrogram height.
SPEC_W - spectrogram width.
audio - one audio file.
x - x is just a temporary variable for get audio file x_train after coverting into spectrogram assigned to the same x_train variable.

bheemnitd/Environmental-Sound-Classification