CDur

Repository for the paper Towards duration robust weakly supervised sound event detection

Currently for training due to the difficulties of obtaining the training data for DCASE2017/18, the script only supports training and evaluation of the URBAN-SED corpus. The links to the datasets for training the DCASE2017 and DCASE2018 datasets are provided.

However, all models in the paper (pretrained) are contained in this repository.

Requirements

scipy==1.5.1
six==1.15.0
fire==0.3.1
loguru==0.5.1
pytorch_ignite==0.3.0
pandas==1.0.5
SoundFile==0.10.3.post1
torch==1.5.0
tqdm==4.47.0
librosa==0.7.2
tabulate==0.8.7
h5py==2.10.0
numpy==1.19.0
adabound==0.0.5
ignite==1.1.0
pypeln==0.4.4
PyYAML==5.3.1
scikit_learn==0.23.1
sed_eval==0.2.1
scikit-multilearn==0.2.0

Usage

The scripts provided in this repo can be used to train and evaluate SED models. In general, all training is done in weakly labeled fashion (WSSED), while evaluation requires strong labels.

The labels use the common DCASE format and are tab separated value files (tsv). The training labels are required to be in the following format:

filename    event_labels
a.wav   event1,event2,event3
b.wav   event4

The evaluation labels use the following format:

filename    onset   offset  event_label
c.wav   0.5 4   Speech
c.wav   0.7 8   Cat
c.wav   0.4 4   Dog

Urban-SED

To train (and download) CDur on the URBAN-SED corpus, run the following:

cd data
# Downloading and/or preparing the urbansed dataset
bash prepare_urbansed.sh
mkdir -p features
# Training features
python3 extract_feature.py flists/urban_sed_train_weak.tsv -o features/urban_sed_train.h5
# Evaluation features
python3 extract_feature.py flists/urban_sed_test_weak.tsv -o features/urban_sed_test.h5
cd ../
pip3 install -r requirements.txt
python3 run.py train_evaluate runconfigs/cdur_urban_sed.yaml  --test_data data/features/urban_sed_test.h5 --test_label data/flists/urban_sed_test_strong.tsv

Reproduce paper results

If you want to just evaluate the results from the paper (here due to the data restrictions only Urban-SED is used). First prepare the data as seen in the URBAN-SED section.

python3 run.py evaluate pretrained/URBAN_SED/  --data data/features/urban_sed_test.h5 --label data/flists/urban_sed_test_strong.tsv

Which should return something like:

Quick Report: 
|               |   f_measure |   precision |   recall |
|---------------|-------------|-------------|----------|
| event_based   |    0.217338 |    0.205556 | 0.233823 |
| segment_based |    0.647505 |    0.697913 | 0.612787 |
| Time Tagging  |    0.775104 |    0.763552 | 0.792407 |
| Clip Tagging  |    0.771375 |    0.80629  | 0.744837 |

DCASE2017

Since the evaluation labels of the DCASE2017 dataset are easily accessible, just run the following script to reproduce the paper results:

cd data
bash prepare_dcase2017_eval.sh
python3 extract_feature.py flists/dcase2017_eval_weak.tsv -o features/dcase2017_eval.h5
cd ../
python3 run.py evaluate pretrained/DCASE2017/ --data data/features/features/dcase2017_eval.h5 --label data/flists/dcase2017_eval_strong.tsv

The result should return something like:

Quick Report: 
|               |   f_measure |   precision |   recall |
|---------------|-------------|-------------|----------|
| event_based   |    0.16225  |    0.190996 | 0.14601  |
| segment_based |    0.491504 |    0.559638 | 0.471156 |
| Time Tagging  |    0.547846 |    0.667211 | 0.483353 |
| Clip Tagging  |    0.536513 |    0.692001 | 0.459966 |

Note that the results are macro-averaged. The micro-averaged ones can also be found in the logs.