/WhisperNote

Subtitle generation w/ Speaker Diarization using Whisper and pyannote.audio

Primary LanguagePythonApache License 2.0Apache-2.0

WhisperNote

A simple Python script to Transcribe audio and perform Speaker Diarization using OpenAI's Whisper and pyannote.audio.

Based on Majdoddin's work discussed on GitHub and available as a Google Colab Notebook.

Running the script

This Project was tested only on Linux, using CPU only and GPU configurations. While it is expected to work on other platforms, it is not guaranteed.

Benchmarks

Input File: 10 minutes of audio in .mp3, of an interview between 2 people.

Transcription

CPU: 2.36 minutes GPU: 2.05 minutes

Citations

pyannote/speaker-diarization pyannote/segmentation

@inproceedings{Bredin2021,
  Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}},
  Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine},
  Booktitle = {Proc. Interspeech 2021},
  Address = {Brno, Czech Republic},
  Month = {August},
  Year = {2021},
}