Urban Sound Monitoring (USM) Dataset - A Dataset for Polyphonic Sound Event Tagging in Urban Sound Monitoring Scenarios
- Jakob Abeßer: Classifying Sounds in Polyphonic Urban Sound Scenes, Proceedings of the 152nd AES Convention (2022), Paper pre-print (PDF)
You can find a 10 minute long demo video with mel spectrogram visualizations of both mixtures and the corresponding stems at https://www.youtube.com/watch?v=pVKB2xeBOJA.
This dataset includes 24,000 5-seconds-long polyphonic stereo soundscapes composed of sounds taken from the FSD50k dataset:
- Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier Serra. FSD50K: an Open Dataset of Human-Labeled Sound Events, arXiv:2010.00475, 2020, 💾 Dataset description & download
FSD50k samples were selected to allow for commercial usage (see licence file)!
The process of mixing the polyphonic soundscapes is detailed in the AES paper (see "Reference").
💾 The USM dataset can be downloaded from https://zenodo.org/record/6413788.
A subset of the original FSD50k sound classes are mapped to 26 new sound classes potentially relevant in urban sound monitoring scenarios such as
- 🚗 🚌 🚁 traffic monitoring
- 🚧 construction site monitoring
- 💣 🚨 😱recognition of rare security-relevant events
- 🐦 🐶 bioacoustic monitoring
- ☔ ☁️ ⚡ weather monitoring
The USM dataset includes 26 sound classes
- airplane, alarm, birds, bus, car, cheering, church bell, dogs, drilling, glass break, gunshot, hammer, helicopter, jackhammer, lawn mower, motorcycle, music, rain, sawing, scream, siren, speech, thunderstorm, train, truck, wind
The USM-SED dataset contains 22,000 soundscapes in its development set (composed of sounds of the FSD50k development set) and 2,000 soundscapes in its evaluation set (composed of sounds of the FSD50k evaluation set).
The development set is further divided into a training set (20,000 soundscapes) and a validation set (2,000 soundscapes).
USM dataset | FSD50k dataset (source) | Soundscapes |
---|---|---|
Training | Development | 20,000 |
Validation | Development | 2,000 |
Evaluation | Evaluation | 2,000 |
- the dataset folder includes 3 folders
train
(training set)val
(validation set)eval
(evaluation set)
- For each soundscape, you can find
- one (stereo) audio file: e.g.
3_mix.wav
- multiple (mono) audio files with the isolated stems, e.g.
3_mix_stem_0.wav
,3_mix_stem_1.wav
, etc. - one binary numpy file with the multi-label targets (26 classes), e.g.
3_mix_targets.npy
- multiple binary numpy files with the single-label targets for the stems (26 classes), e.g.
3_stem_0_target.npy
- one (stereo) audio file: e.g.
- in the
metadata
folder in this repository you can findclass_labels.csv
- all sound classesusm_{train/val/eval}.csv
- CSV files with details about sample composition of all polyphonic soundscapes in the training (train), validation (val), and evaluation (eval) datasets, the following columns are used in the CSV file:- ID
- class_usm (USM sound class)
- class_fsd (original sample sound label in FSD50k dataset)
- file (original filename in FSD50k dataset)
- licence (licence, the original sample was published under)
- dur_sec (original sample duration in seconds)
- part (FSD50k subset (dev / eval) the sample came from)
- init_silence_sec (initial silence added in the processed sample)
- on_sec (sample onset in the original sample)
- off_sec (sample offset in the original sample)
- is_foreground (bool to indicate whether sound is foreground or background sound, see paper for details)
- mix_coeff_db (mixing coefficient in dB)
- stereo_coeff (stereo placement coefficient)
- usm_id (soundscape ID within the USM dataset, each sample is mixed to)
This work has been supported by the German Research Foundation (AB 675/2-2).