AI-Powered Meeting Summarizer

Overview

The AI-Powered Meeting Summarizer is a Gradio-powered application that converts audio recordings of meetings into transcripts and provides concise summaries using whisper.cpp for audio-to-text conversion and Ollama for text summarization. This tool is ideal for quickly extracting key points, decisions, and action items from meetings.

Screen.Recording.2024-10-01.at.5.53.13.PM.mp4

Features

Audio-to-Text Conversion: Uses whisper.cpp to convert audio files into text.
Text Summarization: Uses models from the Ollama server to summarize the transcript.
Multiple Models Support: Supports different Whisper models (base, small, medium, large-V3) and any available model from the Ollama server.
Translation: Allows translation of non-English audio to English using Whisper.
Gradio Interface: Provides a user-friendly web interface to upload audio files, view summaries, and download transcripts.

Requirements

Python 3.x
FFmpeg (for audio processing)
Whisper.cpp (for audio-to-text conversion)
Ollama server (for text summarization)
Gradio (for the web interface)
Requests (for handling API calls to the Ollama server)

Pre-Installation

Before running the application, ensure you have Ollama that is running on your local machine or a server. You can follow the instructions provided in the Ollama repository to set up the server. Do not forget to download and run a model from the Ollama server.

# To install and run Llama 3.2
ollama run llama3.2

Installation

Follow the steps below to set up and run the application:

Step 1: Clone the Repository

git clone https://github.com/AlexisBalayre/AI-Powered-Meeting-Summarizer
cd AI-Powered-Meeting-Summarizer

Step 2: Run the Setup Script

To install all necessary dependencies (including Python virtual environment, whisper.cpp, FFmpeg, and Python packages), and to run the application, execute the provided setup script:

chmod +x run_meeting_summarizer.sh
./run_meeting_summarizer.sh

This script will:

Create and activate a Python virtual environment.
Install necessary Python packages like requests and gradio.
Check if FFmpeg is installed and install it if missing.
Clone and build whisper.cpp.
Download the required Whisper model (default: small).
Run the main.py script, which will start the Gradio interface for the application.

Step 3: Accessing the Application

Once the setup and execution are complete, Gradio will provide a URL (typically http://127.0.0.1:7860). Open this URL in your web browser to access the Meeting Summarizer interface.

Alternatively, after setup, you can activate the virtual environment and run the Python script manually:

# Activate the virtual environment
source .venv/bin/activate

# Run the main.py script
python main.py

Usage

Uploading an Audio File

Upload an Audio File: Click on the audio upload area and select an audio file in any supported format (e.g., .wav, .mp3).
Provide Context (Optional): You can provide additional context for better summarization (e.g., "Meeting about AI and Ethics").
Select Whisper Model: Choose one of the available Whisper models (base, small, medium, large-V3) for audio-to-text conversion.
Select Summarization Model: Choose a model from the available options retrieved from the Ollama server.

Viewing Results

After uploading an audio file, you will get a Summary of the transcript generated by the selected models.
You can also download the full transcript as a text file by clicking the provided link.

Customization

Changing the Whisper Model

By default, the Whisper model used is small. You can modify this in the run_meeting_summarizer.sh script by changing the WHISPER_MODEL variable:

WHISPER_MODEL="medium"

Alternatively, you can select different Whisper models from the dropdown in the Gradio interface. The list of available models is dynamically generated based on the .bin files found in the whisper.cpp/models directory.

Downloading Additional Whisper Models

To download a different Whisper model (e.g., base, medium, large), use the following steps:

Navigate to the whisper.cpp directory:
```
cd whisper.cpp
```
Use the provided script to download the desired model. For example, to download the base model, run:
```
./models/download-ggml-model.sh base
```
For the large model, you can run:
```
./models/download-ggml-model.sh large
```
This will download the .bin file into the whisper.cpp/models directory.
Once downloaded, the new model will automatically be available in the model dropdown when you restart the application.

Configuring Translation

By default, Whisper will detect the language of the audio file and translate it to English if necessary. This behavior is controlled by the -l flag in the whisper.cpp command.

./whisper.cpp/main -m ./whisper.cpp/models/ggml-{WHISPER_MODEL}.bin -l fr -f "{audio_file_wav}"

This flag tells Whisper to translate the audio into French regardless of the original language.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgements

whisper.cpp by Georgi Gerganov for the audio-to-text conversion.
Gradio for the interactive web interface framework.
Ollama for providing large language models for summarization.