/tik-team

Global AI Hub Project - UrbanSounds8K

Primary LanguageJupyter NotebookMIT LicenseMIT

tik-team

Global AI Hub Project - UrbanSounds8K


In this project, how to perform Urban Sound Classification with deep learning is described. First, the general perspective of the project is presented, and then, (Data Set), the tools used and the results of the project are examined. All the codes used for this project are included in this section.

Overview of the sound classification project

Sound classification with deep learning and automatically is a growing field with numerous real-world applications. While much research has been done on audio files such as speech or music, work on ambient sound is relatively rare.

Likewise, seeing recent advances in image classification where Convolutional Neural Networks are used to classify images with high accuracy raises the question of whether these methods can be used in other fields such as classification. Sound is also used or not. There are many real-world applications for what is taught in this material, which are described below.

  1. Content-based multimedia indexing and retrieval
  2. Helping the deaf to do daily activities
  3. Use in smart home applications such as 360 degree security and security features
  4. Industrial uses such as predictive maintenance

What is meant by audio data?

You are always in contact with sound, directly or indirectly. Your brain is constantly processing and understanding sound information and giving you information about the environment. A simple example would be the conversations you have with people on a daily basis. This talk is discussed by another person to continue the discussion. Even when you are thinking in a quiet environment, you tend to pick up very subtle sounds like rustling leaves or the sound of rain. This is the extent of your connection with sound.

And can you somehow get the floating sounds around you to do something useful? Yes of course! Devices have been developed to help capture these sounds and can display them in a computer-readable format. Examples of these formats are:

WAV (Waveform Audio File) format MP3 (MPEG-1 Audio Layer 3) format WMA (Windows Media Audio) format

Audio processing applications

📌 Although we said that audio data can be useful for analysis, what are the potential applications of audio processing? Here we introduce some of them:

  1. Indexing of music collections according to their audio characteristics
  2. Music suggestion for radio channels
  3. A similar search for audio files
  4. Speech processing and synthesis - artificial voice generation for conversational agents

Data collection

📌 For this problem, a dataset called Urbansound8K has been used. This dataset contains 8732 selected audio files (less than 4 seconds) of urban sounds from 10 classes, which are:

  1. Air conditioner
  2. Car horn
  3. child's play
  4. dog's bark
  5. drilling
  6. The sound of the car engine at rest
  7. gun fire
  8. Jackhammer (hand drill)
  9. Siren
  10. Street music

Download the dataset from the following address:

https://urbansounddataset.weebly.com/urbansound8k.html

and using the UrbanSound8K.csv file in the data folder