rhasspy/larynx

word by word timestamp or "boundary" event

nicolehe opened this issue · 1 comments

Hi!

It would be great to have the ability to do something like print a word as it's being spoken, either with a word-by-word timestamp feature of an "onboundary" type of event like in the Web Speech API: https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesisUtterance/onboundary

Thanks!

This has been added in Larynx 1.0 via the <mark> SSML tag! It currently only works between sentences, however.

There are two ways to make use of it:

  1. Use --mark-file on the command-line to have the name of each mark printed as its encountered:
larynx -v en --ssml --mark-file /dev/stderr '<mark name="start" />This is a test.<mark name="end" />'

This will print "start" to standard error, say the sentence, then print "end".

  1. Programmatically from the results of the larynx.text_to_speech API. The TextToSpeechResult object (yielded for each sentence) contains a marks_before and marks_after list with the names of the marks that were encountered.