This project provides a Python-based pipeline for extracting audio from video files, performing speaker diarization, and transcribing audio using OpenAI's Whisper. It identifies different speakers in the audio and aligns transcriptions with these speakers.
- Audio extraction from video files
- Audio trimming to a specified duration
- Speaker diarization to identify different speakers in the audio
- Audio transcription using OpenAI's Whisper
- Matching transcriptions with speaker diarization data
- Python 3.x
- Libraries:
pydub
,yt_dlp
,pyannote.audio
,whisper
- ffmpeg (for audio extraction from video)
Usage Place your audio or video files in a directory accessible by the script. Run the main.py script with the path to your file: bash Copy code python main.py path_to_your_file.mp4 Arguments:
input_file: Path to the input audio or video file. is_video (optional): Set to True if the input file is a video (default: True). duration_minutes (optional): Duration in minutes to trim the audio (default: 20). Modules audio_processing.py: Handles audio extraction and trimming. speaker_diarization.py: Performs speaker diarization using pyannote.audio. transcription.py: Transcribes audio using Whisper. main.py: Integrates all modules and runs the pipeline. Contributing Contributions to improve the project are welcome. Please follow the standard fork and pull request workflow.
License [Your chosen license]
Acknowledgments OpenAI for providing the Whisper model. Contributors and maintainers of pyannote.audio, pydub, and yt_dlp.