sharathadavanne/seld-net

Custom Audio Set

moreydesignstudios opened this issue · 5 comments

Hello @sharathadavanne
Great Work!
I have executed all code on my machine, and could build model on my side.
But I would like to know how I can build the model with my custom dataset.
I have small scale dataset(100 audio clips with different durations) with single label(noise) and would like to build model with your code.
Would you give me a help for this issue?
Thank you

Hi @moreydesignstudios thanks for having a look at my work.

I am yet to release the code for dataset generation. In the meanwhile you can use publicly available tool such as pyroomacoustics, check this particular example script, you will get an idea on how to create a multichannel audio with your custom sound event placed in a particular position in room. Now in order to use the seld-net codebase all you need to do are the following steps:

Step 1: Generate metadata (csv files) similar to our dataset. That is, each csv file represents a single recording, and lists the sound events in it, their onset-offset times, and spatial locations.
Step 2: Write a wrapper script for the pyroom acoustics package. This reads the above csv file and synthesizes the multichannel audio.

Once you have the audio and metadata you can simply run the seld-net codebase on it. Does this answer your question? If not, let me know .

Thank you for your note.
That's helpful. I thought I had to prepare similar dataset with metadata, but I would like to know in more detail how I can prepare same dataset with pyroomacoustics.
The major problem is to find azimuth and elevation from audio file.
After checking your example script, I could know it is about sound propagation.
How you could get elevation and azimuth from audio file(wav)? Also current your dataset sound files was normalized by pyroomacoustics? Then what was your normalization method?
Looking for your answers as well.
Thank you

Hi @moreydesignstudios the metadata file in step 1 from the previous comment is randomly created. You randomly assign each sound event clip an azimuth and elevation angle. The pyroomacoustics library will handle positioning your sound event in the respective azimuth and elevation angle. All you need to do is write a wrapper script to pyroomacoustics which will read the metadata and call the right functions in the pyroomacoustics library and save the generated audio file.

Hi @sharathadavanne and @moreydesignstudios, I have created the Python wrapper script using pyroomacoustics. One thing I am uncertain about how you arranged the data:

sound_event_recording, start_time, end_time, ele, azi
keysDrop024.wav, 0.91180475327, 1.22980928842, 40,-130, 1.0 

What is the name of the column for the 1.0?

Hi @akhilvasvani the last column is the distance of the source from the microphone. This information is only used for the synthesis of the dataset and not used by seld-net.