This project implements speaker diarization for Portuguese-language audio files, using WhisperX for transcription and Speaker-Diarization 3.1 from PyAnotAudio for identifying and separating speakers. The project also includes a Flask UI, which allows users to easily upload audio files, perform transcription, and view speaker diarization results. Additionally, it automatically detects the gender of the speakers (Male or Female).
- Audio Transcription: Utilizes WhisperX for high-quality transcription of Portuguese audio.
- Speaker Diarization: Uses PyAnotAudio's Speaker-Diarization 3.1 to distinguish between multiple speakers.
- Flask Web Interface: A user-friendly interface to upload audio files and view transcription and diarization results.
- Automatic Gender Detection: Automatically identifies and labels speakers as Male or Female.
To set up the project locally, follow these steps:
-
Clone the repository:
git clone https://github.com/Jpzinn654/speaker-diarization-portuguese cd speaker-diarization-portuguese
-
Install the required dependencies:
pip install -r requirements.txt
-
Install PyAnotAudio (if not already installed):
pip install pyanotaudio
-
Install WhisperX model (follow instructions from the official WhisperX repository for setup).
-
Start the Flask application:
python app.py
-
Open your browser and visit:
http://localhost:5000
- Upload an audio file in Portuguese.
- Wait for the transcription process to complete.
- The system will process the diarization and show the transcription along with labeled speakers.
- Optionally, you can set the gender (Male/Female) for each speaker segment automatically.
- WhisperX: For transcribing Portuguese audio into text.
- PyAnotAudio: For speaker diarization.
- Flask: Web framework for building the UI.
- Tailwind CSS: For styling the web interface.
- HTML/JavaScript: For frontend development.
Feel free to fork the repository, make changes, and submit pull requests. If you're adding new features or fixing bugs, make sure to include relevant tests.
This project is licensed under the MIT License - see the LICENSE file for more details.