/realtime_transcription

ASR using OpenAI capability API `v1/audio/transcriptions` like Groq, SiliconFlow

Primary LanguagePython

Realtime transcription

An ASR engine by using free transcription API style like OpenAI's v1/audio/transcriptions.

I have 2 backends tested siliconflow and groq.

You can found their API references from below links:

Quick Run

** set key ** cp .env.example .env edit .env set SILICONFLOW_KEY or GROQ_KEY

python mic_example.py This will realtime transcribe your audio input from microphone to text.

Class Usage

class VADSegmentRealTime:
    def __init__(self, sample_rate=8000,voice_confidence=0.80,system_seg_inerval=0.3, user_seg_interval = 0.8, mode="precise", on_text_change=None, on_seg_end=None):
...
  • sample_rate: sample rate of your audio input

  • voice_confidence: confidence threshold for voice activity detection (VAD)

  • system_seg_inerval: minimum interval between segments detected by the system

  • user_seg_interval: minimum interval between segments that will be returned to the user

  • mode: "precise" or "saving", precise mode is more accurate but slower and consumer more tokens, saving mode is faster but less accurate. "precise" is recommended.

  • on_text_change: callback function that is triggered when the text of the current segment changes

  • on_seg_end: callback function that is triggered when a user defined segment ends

Credits

  • silero-vad This project using silero-vad for voice detect and segment.