AssemblyAI STT plugin short utterances not being detected
Closed this issue · 2 comments
Short replies to the Voice Assistant like "Yes" or "No" are often not transcribed. I don't see any input param that could help manage this, and the short utterances are currently not an issue with our same setup but using Deepgram STT (with no_delay=True
and energy_filter=deepgram.AudioEnergyFilter(min_silence=0.2)
, if relevant.
To reproduce, make a VoicePipelineAgent that uses AssemblyAI STT and give y/n answers
Got a response from AssemblyAI that the issue is likely on their end so I will close this issue.
From AssemblyAI:
"is this issue occurring when yes" or "no" is stated multiple times in a row [yes]? Our current streaming STT model has a limitation where repeated words are not transcribed. For example, if "Yes" is said multiple times with no other words in between, only the first "Yes" is captured. This issue will be resolved with the new streaming model our research team is developing for release in Q1. In the meantime, some customers address this by closing and reopening the WebSocket stream between yes/no questions or inserting filler words like "and," "hyphen," or "period" to avoid the issue."