The proposal evaluates the potential of convolutional neural networks in classifying short audio clips of environmental sounds. We trained the model and observed that existing dataset is insufficient to get the good accuracy, So we did Data Augmentation We added white noise to copy of existing dataset, so now we have 4000 training example. The 10% data is used to test the model and 10% of remaining data to evaluate the model. After augmentation we trained the model with 4 Conv2D, 5 relu activation, 4 MaxPooling2D, 1 softmax activation, 1 Dropout and 2 Dense layer. We used 30 epochs and achieved 95% of training accuracy and 91% of Validation accuracy.
The ESC-50 dataset is a labeled collection of 2000 environmental audio recordings of environmental sound.
The dataset consists of 5-second-long recordings organized into 50 semantical classes (with 40 examples per class) loosely arranged into 5 major categories:
Animals | Natural soundscapes & water sounds | Human, non-speech sounds | Interior/domestic sounds | Exterior/urban noises |
---|---|---|---|---|
Dog | Rain | Crying baby | Door knock | Helicopter |
Roosters | Sea waves | Sneezing | Mouse click | Chainsaw |
Pig | Crackling fire | Clapping | Keyboard typing | Siren |
Cow | Crickets | Breathing | Door, wood creaks | Car horn |
Frog | Chirping birds | Coughing | Can opening | Engine |
Cat | Water drops | Footsteps | Washing machine | Train |
Hen | Wind | Laughing | Vacuum cleaner | Church bells |
Insects (flying) | Pouring water | Brushing teeth | Clock alarm | Airplane |
Sheep | Toilet flush | Snoring | Clock tick | Fireworks |
Crow | Thunderstorm | Drinking, sipping | Glass breaking | Hand saw |
Clips in this dataset have been manually extracted from public field recordings gathered by the Freesound.org project. The dataset has been prearranged into 5 folds for comparable cross-validation, making sure that fragments from the same original source file are contained in a single fold.
A more thorough description of the dataset is available in the original paper with some supplementary materials on GitHub: ESC: Dataset for Environmental Sound Classification - paper replication data.
The dataset can be downloaded as a single .zip file (~600 MB):
Download ESC-50 dataset
Classes
Install these libraries to run the code.
audio_files - store all audio file and corresponding class.
sampling_rate - which keep the flow of number of element per second of audio files.
spectrogram - keep spectrogram of audio file.
augmented_audio_files - store audio with white noise of original files and then combined with audio_files
SPEC_H - spectrogram height.
SPEC_W - spectrogram width.
audio - one audio file.
x - x is just a temporary variable for get audio file x_train after coverting into spectrogram assigned to the same x_train variable.