This code demonstrates how to transcribe audio segments using the OpenAI Whisper ASR model. It covers the following steps:
-
Audio Download:
- The code starts by downloading an audio file from Google Drive using its file ID.
-
Audio Playback:
- The downloaded audio file is played using IPython to verify its content.
-
Audio Splitting:
- The audio is split into two segments: caller and callee, each containing one half of the conversation.
-
Whisper ASR Model:
- The Whisper ASR model is loaded, and audio segments are transcribed using the model.
-
VTT File Generation:
- The transcriptions are saved in WebVTT (VTT) format, suitable for captioning and subtitles.
-
Conversation Data Storage:
- Transcriptions and audio segments' paths are stored in a JSON file for further analysis or use.
Before running this code, make sure you have the necessary packages installed:
- pydub
- requests
- openai
- whisper
You can install these packages using pip:
pip install pydub requests openai whisper