This project objective is to evaluate models on a ABX task on noise.
We first evaluate the models on the Dcase Dataset.
We use the STARSS22 dataset to generate the .item
files.
The Sony-TAu Realistic Spatial Soundscapes 2022 (STARSS22) dataset contains multichannel recordings of sound scenes in various rooms and environments, together with temporal and spatial annotations of prominent events belonging to a set of target classes.
The 13 sound classes are :
- Female speech, woman speaking
- Male speech, man speaking
- Clapping
- Telephone
- Laughter
- Domestic sounds
- Walk, footsteps
- Door, open or close
- Music
- Musical instrument
- Water tap, faucet
- Bell
- Knock
For each recording, the labels are provided in a CSV file :
[frame number (int)], [active class index (int)], [source number index (int)], [azimuth (int)], [elevation (int)]
A frame correspond to a temporal relolution of 100ms.
We removed the "Music" class (8), and keep the frames with only one class activated at a time to compute the ABX scores.
The results on Dcase only are promising, so we decided to keep only a few classes and to extend the dataset with noise segments from AudioSet.
The classes kept from Dcase are :
- Walk, footsteps
- Clapping
- Water tap
- Male speech
- Female speech
- Domestic sounds : seperated in two classes Vacuum cleaner and Air conditioning
To separate domestic sounds, we used two additional labels on Dcase: 13. Vacuum Cleaner 14. Air Conditioning
We added the following classes from AudioSet:
- Air conditioning
- Baby Cry
- Knock
- Purr
- Rain
- Vacuum cleaner
- Walk, footsteps
- Water tap
The final item files are in : item_files/item_files_merged
To compute the ABX score, use CPC2.
You can use the launchers in launchers
The item files are in ./item_files/final_merged
The audiofiles are on Jean Zay : /gpfswork/rech/xdz/commun/abx_noise/audiofiles
You can use scripts/plot_abx.py
to plot the mean and std ABX error rate according to training duration.