livekit/agents

AssemblyAI STT plugin short utterances not being detected

Closed this issue · 2 comments

cch41 commented

Short replies to the Voice Assistant like "Yes" or "No" are often not transcribed. I don't see any input param that could help manage this, and the short utterances are currently not an issue with our same setup but using Deepgram STT (with no_delay=True and energy_filter=deepgram.AudioEnergyFilter(min_silence=0.2), if relevant.

To reproduce, make a VoicePipelineAgent that uses AssemblyAI STT and give y/n answers

cc @oconnoob @keepingitneil

cch41 commented

Got a response from AssemblyAI that the issue is likely on their end so I will close this issue.

From AssemblyAI:
"is this issue occurring when yes" or "no" is stated multiple times in a row [yes]? Our current streaming STT model has a limitation where repeated words are not transcribed. For example, if "Yes" is said multiple times with no other words in between, only the first "Yes" is captured. This issue will be resolved with the new streaming model our research team is developing for release in Q1. In the meantime, some customers address this by closing and reopening the WebSocket stream between yes/no questions or inserting filler words like "and," "hyphen," or "period" to avoid the issue."

thanks for the update @cch41, this will help others who run into the same issue