ProTK: A Prosody Toolkit

This is ProTK, a prosody toolkit developed to help create machine learning models for detection/classification of filled pauses in recorded speech. It is currently developed at the University of Minnesota-Twin Cities College of Pharmacy.

Authors

Current:

Jacob Okamoto (UMN Computer Science)
Serguei Pakhomov (UMN Pharmacy)

Advising:

Elizabeth Shriberg (Microsoft)
Andreas Stolcke (Microsoft)

Past:

Thomas Christie (UMN Cognitive Science)

Overview

ProTK is a toolkit developed to help create machine learning models of recorded speech. It has three primary components: a data ingest module, a feature extraction module, and an ARFF generation module. These three modules use an SQLite database to store and retrieve information in a structured intermediate format.

The workflow for ProTK's core functionality is simple: ingest analysis units from HTK recs or Praat TextGrids, extract features for each unit ingested, and output an ARFF file of the features extracted.

New Features

The primary new features of the rewritten ProTK are:

Arbitrary Units of Analysis: ProTK supports the generation of arbitrary units of analysis, specifically frames of specified length (frame size) and overlap (window size).
Multi-Tier Targeting: classification values (i.e., YES/NO truth values) can be generated by ProTK by checking whether a unit of analysis in the output tier occurs within a specific kind of unit of analysis in another tier. For example, this can check if a vowel occurs inside of a filled pause.
Passthrough Features: additional metadata from TextGrid files can be passed through from ProTK’s ingest engine to the ARFF output as additional ARFF attributes.
Contextual Information: ProTK can output arbitrary-width context for each unit of analysis during ARFF generation. This places information about n preceding and following units with the current unit in the ARFF output.
Multiprocessing: ProTK supports multi-core processors when running Praat analysis. It will run as many Praat processes in parallel as there are reported processing cores by the system.
High-Performance C Operations: the ProTK distribution includes a high-performance C ARFF generator for fast analysis of large datasets using a very small subset of features specific to filled-pause detection. This ARFF generator allowed us to process a large (200+ files) dataset in one hour instead of 20 or more.

Interspeech 2012

This software was presented at Interspeech 2012 in Portland, Oregon. Demonstration code is available at <https://github.com/oko/protk-demo>. Note that the demo code does not include the audio for testing (tested against RIT/UPenn's TRAINS corpus).

megazone87/protk

ProTK: A Prosody Toolkit

Authors

Overview

New Features

Interspeech 2012