/speech-aggression

Repository of data and scripts of UGC-UKIERI Project on "Automatic Detection of Verbal Threat in HIndi and English Aggressive Speech"

Primary LanguagePraatApache License 2.0Apache-2.0

Aggression in Hindi and English Speech

This repository contains data, models and some utility scripts generated as part of UGC-UKIERI Project title "Automatic Detection of Verbal Threat in Hindi and English Aggressive Speech", led jointly by Dr. Ritesh Kumar, K.M. Institute of Hindi and Linguistics, Dr. Bhimrao Ambedkar University, Agra and Prof. Daniel Kadar, University of Huddersfield, UK and carried out in collaboration with University of Surrey, UK, Jawaharlal Nehru University, New Delhi, Microsoft Research India, Bangalore, UnReaL-TecE LLP and Panlingua Language Processing LLP, New Delhi.

Data

The directory 'data' in the repository contains

  • TextGrid files and
  • The training files (in SVMLight format) used for building the models. The features of speech signal were extracted using the OpenSMILE v2.2 (now its v3.0 is available) library. The model was trained using the SVM Multiclass library. These could be directly used for training and experimenting with more models without the need to extract the features again.

The raw audio files could be accessed at the following links -

The original video files are accessible via the links included in the METADATA file. The metadata file also contains the information such as mapping of audio files to their respective TextGrid files, size of different audio files, their format and their duration / length.

Models

The directory 'model' contains the best models for Hindi and English. These files are generated by the SVM Multiclass library and so are expected to work with that.

Scripts

The directory 'scripts' contains some helper Shell Scripts (used and tested on Ubuntu OS) for pre-processing the video files and generating the features for training the model. These scripts include the following -

  • 1_video_to_audio - This script extracts the sound track from the .mp4 video files and saves in the WAV format.
  • 2_make_compatible_audio - It converts an audio file in a format compatible with the OpenSMILE library
  • 3_save_labeled_intervals_to_wav_sound_files - Its a PRAAT script to automatically slice an annotated sound file into multiple files
  • 4a_extract_features_hindi - It extracts features from multiple audio files using OpenSMILE library, as per the specification of the config file of the library
  • 4b_extract_features_english - It extracts features from multiple audio files using OpenSMILE library, as per the specification of the config file of the library

App

We plan to release the code of the test app soon. At present, the app is however accessible online via the following link - Aggression Recognition Tool ART

Feedback and Contact

For any feedback / suggestions / collaboration, please contact Dr. Ritesh Kumar @ ritesh78_llh at jnu dot ac dot in.