Bidi streaming proposal end of utterance detection
seyuf opened this issue · 2 comments
Hi,
Much thanks for this awesome work!
I have a use case deriving from my use of the project. And I thought it was worth exposing here, as it believe it can be implemented directly on the main branch.
If i've already implemented some kind of PoC or v1 here.
The idea would be to, add silence/ end of utterance detection to the server.
Today, what i observe is that in bidistreaming, the server is transcribing indefinitely streams of messages sent from the client. Appending the results at each iteration.
So if one wants to reset (the result), one is forced to kill the connection, from the client.
What i made in the above link is kinda similar, i just send from the client side in the audio config message end_of_utterance
value, which tells the server im done. Send me the last result and close the connection. I also set in the last result massage, some is_final value signalling that this is the last result from the server and that the connection has been closed to the client.
Although this works, it is not very satisfying, as to me the right thing would be the keep the connection alive but just reset the results when an utterance has ended. I also believe that the server could also do the end of utterance detection using silence detection.
The idea would be to consider that was at the end of an utterance, if we receive silent audio for some amount of time or iteration (the code seems already in place here)
So:
- client specify in the message /audio config if it would like the server to detect the end of utterances. (if not we keep the current behaviour)
- Client sends streams of messages
- After multiple consecutives empty audio decoding the server decides, we're at an end of utterance
- Server send back result with (
is_final
set to true in the response message). - Server reset data, but keeps connection alive (or may be killing it? Could be optional), waiting for new input from client.
I hope it the understandable enough? If so i would like some feedback, if possible?
Regards
Hey, @seyuf can we reopen this? the feature is something we haven't considered yet but will like to have some discussion before closing.
Not guarantying a discussion now but let's keep this open :)
Hi, sure np.