Cog Whisper Diarization

Audio transcribing + diarization pipeline.

Models used

Used at Audiogest
Or try at Replicate
Or deploy yourself at Replicate (Make sure to add your own HuggingFace API key and accept the terms of use of the pyannote models used)

file_string: str: Either provide a Base64 encoded audio file.
file_url: str: Or provide a direct audio file URL.
file: Path: Or provide an audio file.
group_segments: bool: Group segments of the same speaker shorter than 2 seconds apart. Default is True.
num_speakers: int: Number of speakers. Leave empty to autodetect. Must be between 1 and 50.
language: str: Language of the spoken words as a language code like 'en'. Leave empty to auto detect language.
prompt: str: Vocabulary: provide names, acronyms, and loanwords in a list. Use punctuation for best accuracy.
offset_seconds: int: Offset in seconds, used for chunked inputs. Default is 0.

segments: List[Dict]: List of segments with speaker, start and end time.
num_speakers: int: Number of speakers (detected, unless specified in input).
language: str: Language of the spoken words as a language code like 'en' (detected, unless specified in input).