Speaker Identification and Transcription Pipeline

This project provides a Python-based pipeline for extracting audio from video files, performing speaker diarization, and transcribing audio using OpenAI's Whisper. It identifies different speakers in the audio and aligns transcriptions with these speakers.

Features

Audio extraction from video files
Audio trimming to a specified duration
Speaker diarization to identify different speakers in the audio
Audio transcription using OpenAI's Whisper
Matching transcriptions with speaker diarization data

Prerequisites

Python 3.x
Libraries: pydub, yt_dlp, pyannote.audio, whisper
ffmpeg (for audio extraction from video)

Installation

Usage Place your audio or video files in a directory accessible by the script. Run the main.py script with the path to your file: bash Copy code python main.py path_to_your_file.mp4 Arguments:

input_file: Path to the input audio or video file. is_video (optional): Set to True if the input file is a video (default: True). duration_minutes (optional): Duration in minutes to trim the audio (default: 20). Modules audio_processing.py: Handles audio extraction and trimming. speaker_diarization.py: Performs speaker diarization using pyannote.audio. transcription.py: Transcribes audio using Whisper. main.py: Integrates all modules and runs the pipeline. Contributing Contributions to improve the project are welcome. Please follow the standard fork and pull request workflow.

License [Your chosen license]

Acknowledgments OpenAI for providing the Whisper model. Contributors and maintainers of pyannote.audio, pydub, and yt_dlp.

RLinnae/Transcription_Pipeline

Speaker Identification and Transcription Pipeline

Features

Prerequisites

Installation