deepgram/deepgram-dotnet-sdk

Live streaming example

Closed this issue · 7 comments

Proposed changes

Add an example of live streaming. Something as simple as a console app that uses NAudio to send bytes to Deepgram and display the result in the console.

Context

I am unable to make Deepgram work with live streaming. I am trying multiple things (sending raw audio, converting raw audio to valid wav in chunks, etc.). I do have keep alive running but I get a "Deepgram connection close", empty transcription results (null channel), or N000 errors. Having an example that works would be great.

Possible Implementation

// Create and authenticate the live deepgram client
var deepgramLive = ...;
// Keep alive, otherwise it closes before you can even start talking
_ = Task.Run(() => KeepAlive(cancellationToken), cancellationToken);
// Log transcripts; also log connection open, close and error
deepgramLive.TranscriptReceived += (_, e) => Console.WriteLine(JsonSerializer.Serialize(e.Transcript));

// NAudio wave in is fairly simple to use
var waveIn = new WaveInEvent();
waveIn.WaveFormat = new WaveFormat(16000, 1);
waveIn.DataAvailable += (s, e) =>{
    // Note: Would be nice if SendData received ArraySegment<byte> instead of copying the memory here
    var bytes = new byte[e.BytesRecorded];
    Array.Copy(e.Buffer, bytes, e.BytesRecorded);
    deepgramLive.SendData(e.Byte);
}

And I have this for the keep alive.

    private async Task KeepAlive(CancellationToken cancellationToken)
    {
        while (_deepgramLive != null && _deepgramLive.State() == WebSocketState.Open)
        {
            await Task.Delay(9000, cancellationToken);
            if (cancellationToken.IsCancellationRequested) return;
            _deepgramLive.KeepAlive();
        }
    }

Other information

I could provide more information but I'm sure you already tested this library somehow for something else than a static audio file, but let me know if you need anything else. Also, I verified that the audio goes through fine and can be played back (I also used Vosk without problem with those same bytes).

@acidbubbles, thanks for the detailed post. Have you checked out the SDK-specific examples in our docs? We do have a .NET live streaming example there. If that one doesn't work for you or is missing any aspects you're looking for, please let us know, and we'll take that feedback into account as we continue to improve our examples coverage and depth.

Thanks for the quick answer, @jkroll-deepgram! I did follow this, however the step deepgramLive.SendData(AUDIO_STREAM_DATA); is light on details :) I did spend a few hours trying to figure it out by myself, but the N000 error message didn't help much either.

As I have shown in the partial example in the original post, I tried forwarding the NAudio WaveIn data as-is (like I do for Vosk and Azure Speech Service, it works fine for them) to no avail. I also tried to add wav headers to the bytes (so each chunk is a valid wav) and that didn't seem to work either.

If it helps I could take what I have and package a small console app, but you should have all the code you need in the original post.

(Also, I know I said it before, but it would be nice if SendData was able to receive an array segment or a span to refer to the buffer instead of reallocating the bytes every time)

Thanks a lot, I'm really looking forward to trying Deepgram!

There is a very basic example of live transcription via the client available at https://github.com/ThindalTV/Deepgram.Live.Console. It does NOT have full functionality in any way, but it can help you get off the ground.

Thanks a lot @ThindalTV , your sample works which is great, I'll check what the difference is (I guess something in WaveInEvent v.s. Audio) and report back, hopefully to help the next folks how have the same problems :) Appreciated!

All right, so this was simple to fix with a working example :) Thanks again @ThindalTV , this was useful and appreciated.

@jkroll-deepgram so the issue was simply that I did not set Channels (1) and SampleRate (16000) on LiveTranscriptionOptions. From the documentation this wasn't obvious, so unless I misunderstand, this seems like mandatory settings to pass. I'd suggest you add it to https://developers.deepgram.com/docs/dotnet-sdk-streaming-transcription examples.

I didn't have to change anything else, however (it can be a separate github issue if you want) it would be very nice to be able to pass in either an ArraySegment or the original bytes array and the bytes count to SendData. Most of the time in NAudio both of the recording data bytes buffer and actual bytes are the same, but if they are not it requires copying the bytes prior to sending them.

Thanks!

@acidbubbles

I didn't have to change anything else, however (it can be a separate github issue if you want) it would be very nice to be able to pass in either an ArraySegment or the original bytes array and the bytes count to SendData

A separate issue would be great. This one sounds like we can close it, just let me know otherwise.

I think the signature makes less sense now that I know you're not sending the bytes. I'd say this may need to be revisited when you check #136 (otherwise it makes no sense to have a buffer signature with length if you ultimately just keep the reference to the buffer instance itself)