This repository contains code for various sound event classification systems implemented on two datasets - (a) DCASE 2019 Task 5 and (b) a subset of Audioset.
Label the presence or absence of certain sound events in a given audio clip using weakly labeled training data. Only the presence/absence of a sound event needs to be labeled, the onset and offset times of the events are not considered. As the final system would be deployed on telephony audio, low quality 8kHz data is simulated to test the approaches on telephony data.
The following datasets have been used for this project. Instructions to reproduce the results for each dataset are given in the README files in their respective folders.
This is the Urban Sound Tagging dataset from DCASE 2019 Task 5. It contains one training split and one validation split.
This is a subset of Audioset containing 10 classes. It is split into 5 different folds for 5-fold cross validation.
- DCASE 2015 Task 5 best performing system
- Feature combination
- Temporal Spectral Attention
- Knowledge distillation for better performance on telephony data