Portuguese Speaker Diarization with Flask UI

Overview

This project implements speaker diarization for Portuguese-language audio files, using WhisperX for transcription and Speaker-Diarization 3.1 from PyAnotAudio for identifying and separating speakers. The project also includes a Flask UI, which allows users to easily upload audio files, perform transcription, and view speaker diarization results. Additionally, it automatically detects the gender of the speakers (Male or Female).

Interface Screenshots

UI without Transcription

UI with Transcription

Features

Audio Transcription: Utilizes WhisperX for high-quality transcription of Portuguese audio.
Speaker Diarization: Uses PyAnotAudio's Speaker-Diarization 3.1 to distinguish between multiple speakers.
Flask Web Interface: A user-friendly interface to upload audio files and view transcription and diarization results.
Automatic Gender Detection: Automatically identifies and labels speakers as Male or Female.

Installation

To set up the project locally, follow these steps:

Clone the repository:

git clone https://github.com/Jpzinn654/speaker-diarization-portuguese
cd speaker-diarization-portuguese

Install the required dependencies:
```
pip install -r requirements.txt
```
Install PyAnotAudio (if not already installed):
```
pip install pyanotaudio
```
Install WhisperX model (follow instructions from the official WhisperX repository for setup).
Start the Flask application:
```
python app.py
```
Open your browser and visit:
```
http://localhost:5000
```

Usage

Upload an audio file in Portuguese.
Wait for the transcription process to complete.
The system will process the diarization and show the transcription along with labeled speakers.
Optionally, you can set the gender (Male/Female) for each speaker segment automatically.

Technologies Used

WhisperX: For transcribing Portuguese audio into text.
PyAnotAudio: For speaker diarization.
Flask: Web framework for building the UI.
Tailwind CSS: For styling the web interface.
HTML/JavaScript: For frontend development.

Contributing

Feel free to fork the repository, make changes, and submit pull requests. If you're adding new features or fixing bugs, make sure to include relevant tests.

License

This project is licensed under the MIT License - see the LICENSE file for more details.