This project contains code to distinguish communicative grunts from filled pauses.
Conversation grunts, short non-lexical utterances can be further subdivided into
- communicative grunts (
cg
), i.e. short non-lexical utterances with communicative intent, and - filled pauses (
fp
), which are rather by-products of speech planning.
This project contains code to distinguish these two classes cg
and fp
by applying a wav2vec2 transformer model exported to ONNX.
See this reference for more details on model architecture and evaluation.
The model has been trained on mono signals with a sampling rate of 16 kHz. The interface in this project takes care that the input signal is converted accordingly.
- the code does not provide speech segmentation, but only classification of pre-segmented speech intervals
- the currently underlying model was trained on Hungarian data only
- install requirements
$ virtualenv --python="/usr/bin/python3" nls
$ source nls/bin/activate
(nls) $ pip install -r requirements.txt
- download model from Zenodo
- record URL
- model DOI:
10.5281/zenodo.10833104
- and unzip in
model/
folder
$ cd model/
$ wget https://zenodo.org/records/10833104/files/nonlex_speech_model.zip?download=1 -O nonlex_speech_model.zip
$ unzip nonlex_speech_model.zip
$ rm nonlex_speech_model.zip
- see example script
import numpy as np
from nonlex_speech_clf import NlsClf
# create dummy signal with duration 1s
sampling_rate = 16000
signal = np.random.rand(sampling_rate)
nls = NlsClf()
y = nls.process_signal(signal, sampling_rate)
print(y)
- see example script
If you use this model for your studies please cite:
Reichel, U.D., Kohári, A., Mády, K. (2023). Acoustics and prediction of non-lexical speech in the Budapest Games Corpus. In: Proc. Speech Research Conference, pdf
@InProceedings{rkm2023,
author = {Reichel, U.D. and Koh\'ari, A. and M\'ady, K.},
title = {Acoustics and prediction of non-lexical speech in the {B}udapest {G}ames {C}orpus},
booktitle = {Proc. Speech Research Conference},
year = {2023},
address = {Budapest, Hungary}
}
- Funder: National Research, Development and Innovation Office
- Project ID: NKFIH K 135038
- Project title: Prosodic structure and sentence types by using large speech databases supported by deep learning techniques
- Project URL