No longer diarizes

Question

No longer diarizes

Opened this issue 10 months ago · 4 comments

Seems that it only performs the transcription and no longer diarization. See below is based on the shared example file (of which the repo is sitll using yinruiqing's HF token - as poined out by Jordi in another thread) 太可怕～

Answer 1 · 2024-02-18T03:41:16.000Z

This token is deactivated. You can use your own token.

Answer 2 · 2024-02-19T01:28:04.000Z

have changed the HF token to my own in the /cli/transcribe.py file...

And used the example code:
python -m pyannote_whisper.cli.transcribe data/afjiv.wav --model tiny --diarization True

Still doesn't work? Am i missing something?

Answer 3 · 2024-02-19T01:40:10.000Z

import whisper
from pyannote.audio import Pipeline
from pyannote_whisper.utils import diarize_text
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization",
                                    use_auth_token="hf_xxxxx -replace with my own")
model = whisper.load_model("tiny.en")
asr_result = model.transcribe("data/afjiv.wav")
diarization_result = pipeline("data/afjiv.wav")
final_result = diarize_text(asr_result, diarization_result)

for seg, spk, sent in final_result:
    line = f'{seg.start:.2f} {seg.end:.2f} {spk} {sent}'
    print(line)

The code in the readme also doesn't work.

Answer 4 · 2024-03-30T04:52:03.000Z

@nexuslux Have you affirmed access through Huggingface repositories? You'll need to agree to the terms for each of the repositories pyannote uses. That would be pyannote/segmentation and pyannote/speaker-diarization.