Python implementation of the following speech intelligibility prediction methods: weighted Spectro-Temporal Modulation Index (wSTMI) Spectro-Temporal Glimpsing Index (STGI)
The functions wstmi
and stgi
take three inputs:
d = pywstmi(clean_speech, degraded_speech, sampling_frequency)
d = pystgi(clean_speech, degraded_speech, sampling_frequency)
clean_speech
: A numpy array containing a single-channel clean (reference) speech signal.degraded_speech
: A numpy array containing a single-channel degraded/processed speech signal.sampling_frequency
: The sampling frequency of the input signals inHz
.
Note that the clean and degraded speech signals must be time-aligned and of the same length.
If you use pywstmi or pystgi, please cite the references [1] and [2], respectively:
[1] A. Edraki, W.-Y. Chan, J. Jensen, & D. Fogerty, “Speech Intelligibility Prediction Using Spectro-Temporal Modulation Analysis,” IEEE/ACM Trans. Audio, Speech, & Language Processing, vol. 29, pp. 210-225, 2021.
[2] A. Edraki, W.-Y. Chan, J. Jensen, & D. Fogerty, “A Spectro-Temporal Glimpsing Index (STGI) for Speech Intelligibility Prediction," Proc. Interspeech, 5 pages, Aug 2021.