This repository contains code for various sound event classification systems implemented on two datasets - (a) DCASE 2019 Task 5 and (b) a subset of Audioset.
Each dataset has its own folder and contains the following subfolders:
- augmentation: package which contains code for various data augmentations
- data: contains raw audio files and generated features
- split: contains metadata required to parse the dataset
- models: contains saved weights
Other than these subfolders, each dataset folder contains a script
statistics.py
to evaluate the channel wise mean and standard deviation for the various features.
This is the Urban Sound Tagging dataset from DCASE 2019 Task 5. It contains one training split and one validation split.
This is a subset of Audioset containing 10 classes. It is split into 5 different folds for 5-fold cross validation.
To reproduce the results, first clone this repository. Then, follow the steps below.
Generate the required type of feature using the following
python compute_<feature_type>.py <input_path> <output_path>
Replace <feature_type>
with one of logmelspec
, cqt
, gammatone
. The output path has to be ./<dataset>/data/<feature_type>
where <dataset>
is one of dcase
or audioset
.
Evaluate the channel wise mean and standard deviation for the features using statistics.py
.
Only two arguments are required: feature type and number of frequency bins.
python statistics.py -w $WORKSPACE -f logmelspec -n 128
An additional parameter is required to specify the training, validation and testing folds. For training on folds 0, 1 and 2, validating on 3 and testing on 4, run
python statistics.py -w $WORKSPACE -f logmelspec -n 128 -p 0 1 2 3 4
To parse the DCASE data, run parse_dcase.py
. No such step is required for Audioset.
The train.py
file for DCASE takes in 3 arguments: feature type, number of time frames and random seed. To train logmel, run
python train.py -w $WORKSPACE -f logmelspec -n 636 --seed 42
Other than the three arguments above, the train.py
file for Audioset takes in an additional argument to specify the training, validation and testing folds. For training on folds 0, 1 and 2, validating on 3 and testing on 4, run
python train.py -w $WORKSPACE -f logmelspec -n 636 -p 0 1 2 3 4
For validation, run evaluate.py
with the same arguments as above but without the random seed argument.
In order to perform feature fusion, refer to section 1 to generate logmelspec
, cqt
and gammatone
features and then train their respective models. Next, to generate the weights of each feature, run
python generate_weights.py -w $WORKSPACE -p 0 1 2 3 4
Finally, run
python feature_fusion.py -w $WORKSPACE -p 0 1 2 3 4