deepgram/deepgram-dotnet-sdk

Buffer bytes silently reused in internal queue in SendData results in repeated text in live transcriptions

Closed this issue ยท 7 comments

What is the current behavior?

If I use something like NAudio to get audio data, internally NAudio uses a buffer. This means that when I call SendData, the buffer is stored. So when there is any latency or disconnections, the EnqueueForSending method actually enqueued the same bytes array, and sends it multiple times.

This makes things like "Hey , I think think think think think think that...". Without looking at the source, I wouldn't have known that the buffer is being queued.

Steps to reproduce

You can probably create artificial latency or disconnect the websockets to cause the problem. For example:

var buffer = new byte[3200];
// fill some values...
deepgramLive.SendData(buffer);
// fill some different values...
deepgramLive.SendData(buffer);
// now let the client dequeue, you'll see twice the same instance

Expected behavior

This is not obvious.

  1. Document that the buffer cannot be reused, so people aren't surprised by random repetition.
  2. Because we cannot control the internal buffers, you could copy the bytes to your own buffer and keep a list of them (a buffer pool). It's impossible to do from "outside" the lib.

Please tell us about your environment

  • Operating System/Version: Windows 11
  • Language: C#

Other information

The only solution to this problem right now is to systematically copy the bytes to a new array, which puts unnecessary pressure on the GC. I could also make a large buffers pool, but then instead of creating small buffers continuously, I'd reserve large amounts of memory which wouldn't ensure that it won't overflow anyway.

Great callout, thanks!

@acidbubbles the team discussed this issue a bit this week and we were curious in our next Major version of the SDK,
What if we eliminating the byte queue all together?

I can't say with certainty what is the right approach but here's my thought.

People who implement the API may want to decide to either slow down their audio streaming, or rely on your implementation of queuing, or provide their own.

Right now, because the queuing is "broken" (at least for most cases of live audio libraries that I've seen that use a buffer), removing it altogether is a sound option.

What you could do however if you wanted to avoid your library consumers to naively implement it in a way that could slow down their app (if you remove the queuing, this means writing the bytes would be blocking right?) is provide a class for buffering locally, but as a wrapper / utility class instead. Something like new BufferedClientWriter(actualClient). If you do that (I'd use it!) just be careful of using immutable byte array representations.

Hope my humble opinion is helpful :)

Will check to see if this still applies for v4.

This should no longer be an issue in v4.

What will be the strategy? There's no buffering/queuing at all, or did you implement a buffer copy?

You are going to have access to an internal queue and also access directly to the send function using a buffer. These should show up in the next beta or the first RC.