daily-co/daily-python

Unable to use Whisper model for Deepgram transcription

Closed this issue · 2 comments

I was able to switch to the highest-accuracy Deepgram model nova-2, currently in beta, like this:

self.client.start_transcription({
    "language": "en",
    "model": "2-ea",
    "tier": "nova",
    "detect_language": True,
    "profanity_filter": False,
    "redact": False
})

However when I try to use "model":"whisper-base", the transcription never happens (there are no on_transcription_message events).

I suspect this may be related to the fact that whisper-base does not include a "tier" parameter. Their documentation says "Deepgram's Whisper Cloud does not expect a tier parameter. Using tier will not work." and I wonder if the SDK is sending tier anyway behind the scenes?

I tried to use "tier": None and also omitting "tier" but neither of these worked.

Additionally, I noticed there is an extra API parameter that has not yet been added to TranscriptionSettings yet: smart_format.

Hi @kylemcdonald,

However when I try to use "model":"whisper-base", the transcription never happens (there are no on_transcription_message events).

I suspect this may be related to the fact that whisper-base does not include a "tier" parameter. Their documentation says "Deepgram's Whisper Cloud does not expect a tier parameter. Using tier will not work." and I wonder if the SDK is sending tier anyway behind the scenes?

I tried to use "tier": None and also omitting "tier" but neither of these worked.

The problem with the Whisper model is that it's only for pre-recorded audio and not for live streams. So you won't be able to use it. See https://developers.deepgram.com/docs/deepgram-whisper-cloud

Additionally, I noticed there is an extra API parameter that has not yet been added to TranscriptionSettings yet: smart_format.

Yes, currently we don't support that. However, we are redesigning how transcriptions works but I can't share anything specific at the moment.

Unralted to this issue, but daily-python 0.3.0 has some transcription improvements related to error handling.

As I mentioned, the whisper model only works for pre-recorded data not for live streams. Let us know if there's anything else we can help with. Closing this issue for now.