Task 4 Large-scale weakly supervised sound event detection for smart cars

Last update May 1: Added baseline code (based on Task 3's system), performance and Subtask A metric code. Update Apr2: Added strong labels. Update Apr1: Added evaluation folder.

Coordinators

Benjamin Elizalde, Emmanuel Vincent, Bhiksha Raj

Data Preparation, Annotations

Ankit Shah (ankit.tronix@gmail.com), Benjamin Elizalde (bmartin1@andrew.cmu.edu)

Annotations, Baseline and Subtask A Metric

Rohan Badlani (rohan.badlani@gmail.com), Benjamin Elizalde (bmartin1@andrew.cmu.edu)

Index

Script to download the development data for Task 4
Script to evaluate Task 4 - Subtask A (Audio tagging)
Strong Label's annotations for Testing

Script to download the development data for Task 4

Prerequisite installations

youtube-dl - [sudo] pip install --upgrade youtube_dl
pafy - [sudo] pip install pafy
tqdm (progress bar) - [sudo] pip install tqdm
multiprocessing - [sudo] pip install multiprocessing
sox tool - sudo apt-get install sox
ffmpeg - sudo apt-get install ffmpeg

Cloning this repository

Since this is a repository that references DCASE2017-baseline-system as a submodule, you should use the following command to clone this repository completely:

git clone --recurse

Features

Downloads the audio from the videos for the testing set first and then for the training set. - Multiprocessing - ensures three files are downloaded simultaneously to reduce the heavy download time to 40 percent as compared with single threaded performance.
Formats the audio with consistent parameters - currently set as 1 channel, 16 bit precision, 44.1kHz sampling rate.
Extracts the 10-sec segments from the formatted audio according to the start and end times.
The script output includes the audio for 1,2 and 3, unless testing script is modified to remove audio from 2 and/or 3, that is the original audio and the formatted audio.
To denote a unique identifier for every run/launch of downloading files - script stores the timestamp and assigns to each of the output files and folder names.
Please, contact Ankit/Benjamin in case one or more videos are not properly downloaded or available, or with any other issue. Participants can create their own scripts to download the audio. Please ensure that you have all the 10-sec clip in the lists.

Lists

Download audio: testing_set.csv, training_set.csv Groundtruth weak labels: groundtruth_weak_label_testing_set.csv groundtruth_weak_label_training_set.csv Groundtruth strong labels: groundtruth_strong_label_testing_set.csv groundtruth_strong_label_training_set.csv

Usage

$python download_audio.py <CSV filename - relative path is also fine> Sample Usage - python download_audio.py training_set.csv

User Modifiable Parameters and Options

Audio formatting can be modified in the "format_audio" method defined in the script download_youtube_audio_from_csv_and_delete_original.py
Removal of original audio and/or formatted audio paths can be done by uncommenting and modifying <os.system(cmdstring2)> in "download_audio_method" function defined in download_audio.py

Output

First folder contains original best audio from youtube: <csv_name><testing/training>_audio_downloaded
Second folder contains the corresponding formatted audio: <csv_name><testing/training>_audio_formatted_downloaded
Third folder contains the extracted 10-sec segments: <csv_name><testing/training>_audio_formatted_downloaded_and_ssegmented_downloads

Note:- To each downloaded audio string "Y" is added as tools like sox and ffmpeg causes problem when filename starts with "--" or "-".

Number of Audio id count files

testing_set_num_files_per_class.csv - For each class - specifies number of audio segments present in the testing set
training_set_num_files_per_class.csv - For each class - specifies number of audio segments present in the training set

Script to evaluate Task 4 - Subtask A (Audio tagging)