Sound Event Classification

This repository contains code for various sound event classification systems implemented on two datasets - (a) DCASE 2019 Task 5 and (b) a subset of Audioset.

Organization

Each dataset has its own folder and contains the following subfolders:

  1. augmentation: package which contains code for various data augmentations
  2. data: contains raw audio files and generated features
  3. split: contains metadata required to parse the dataset
  4. models: contains saved weights Other than these subfolders, each dataset folder contains a script statistics.py to evaluate the channel wise mean and standard deviation for the various features.

Datasets

A. DCASE

This is the Urban Sound Tagging dataset from DCASE 2019 Task 5. It contains one training split and one validation split.

B. Audioset

This is a subset of Audioset containing 10 classes. It is split into 5 different folds for 5-fold cross validation.

Reproducing the results

To reproduce the results, first clone this repository. Then, follow the steps below.

1. Generating the features

Generate the required type of feature using the following
python compute_<feature_type>.py <input_path> <output_path>
Replace <feature_type> with one of logmelspec, cqt, gammatone. The output path has to be ./<dataset>/data/<feature_type> where <dataset> is one of dcase or audioset.

2. Evaluating channel wise mean and standard deviation

Evaluate the channel wise mean and standard deviation for the features using statistics.py.

DCASE

Only two arguments are required: feature type and number of frequency bins.
python statistics.py -w $WORKSPACE -f logmelspec -n 128

Audioset

An additional parameter is required to specify the training, validation and testing folds. For training on folds 0, 1 and 2, validating on 3 and testing on 4, run
python statistics.py -w $WORKSPACE -f logmelspec -n 128 -p 0 1 2 3 4

3. Parse the data

To parse the DCASE data, run parse_dcase.py. No such step is required for Audioset.

4. Training

DCASE

The train.py file for DCASE takes in 3 arguments: feature type, number of time frames and random seed. To train logmel, run
python train.py -w $WORKSPACE -f logmelspec -n 636 --seed 42

Audioset

Other than the three arguments above, the train.py file for Audioset takes in an additional argument to specify the training, validation and testing folds. For training on folds 0, 1 and 2, validating on 3 and testing on 4, run
python train.py -w $WORKSPACE -f logmelspec -n 636 -p 0 1 2 3 4

5. Validating

For validation, run evaluate.py with the same arguments as above but without the random seed argument.

6. Feature Fusion

In order to perform feature fusion, refer to section 1 to generate logmelspec, cqt and gammatone features and then train their respective models. Next, to generate the weights of each feature, run
python generate_weights.py -w $WORKSPACE -p 0 1 2 3 4

Finally, run
python feature_fusion.py -w $WORKSPACE -p 0 1 2 3 4

7. Audioset subset results

image