All data have been obtained from UrbanSound8K dataset.
The audio files were first converted into spectrograms. Then, these spectrograms were transformed into images.
These images were converted to grayscale and resized to 128x128 pixels.
The model design is based on the simple CNN model found on the TensorFlow tutorial for image CNNs.