The AI-Powered Meeting Summarizer is a Gradio-powered application that converts audio recordings of meetings into transcripts and provides concise summaries using whisper.cpp
for audio-to-text conversion and Ollama
for text summarization. This tool is ideal for quickly extracting key points, decisions, and action items from meetings.
Screen.Recording.2024-10-01.at.5.53.13.PM.mp4
- Audio-to-Text Conversion: Uses
whisper.cpp
to convert audio files into text. - Text Summarization: Uses models from the
Ollama
server to summarize the transcript. - Multiple Models Support: Supports different Whisper models (
base
,small
,medium
,large-V3
) and any available model from the Ollama server. - Translation: Allows translation of non-English audio to English using Whisper.
- Gradio Interface: Provides a user-friendly web interface to upload audio files, view summaries, and download transcripts.
- Python 3.x
- FFmpeg (for audio processing)
- Whisper.cpp (for audio-to-text conversion)
- Ollama server (for text summarization)
- Gradio (for the web interface)
- Requests (for handling API calls to the Ollama server)
Before running the application, ensure you have Ollama that is running on your local machine or a server. You can follow the instructions provided in the Ollama repository to set up the server. Do not forget to download and run a model from the Ollama server.
# To install and run Llama 3.2
ollama run llama3.2
Follow the steps below to set up and run the application:
git clone https://github.com/AlexisBalayre/AI-Powered-Meeting-Summarizer
cd AI-Powered-Meeting-Summarizer
To install all necessary dependencies (including Python virtual environment, whisper.cpp
, FFmpeg, and Python packages), and to run the application, execute the provided setup script:
chmod +x run_meeting_summarizer.sh
./run_meeting_summarizer.sh
This script will:
- Create and activate a Python virtual environment.
- Install necessary Python packages like
requests
andgradio
. - Check if
FFmpeg
is installed and install it if missing. - Clone and build
whisper.cpp
. - Download the required Whisper model (default:
small
). - Run the
main.py
script, which will start the Gradio interface for the application.
Once the setup and execution are complete, Gradio will provide a URL (typically http://127.0.0.1:7860
). Open this URL in your web browser to access the Meeting Summarizer interface.
Alternatively, after setup, you can activate the virtual environment and run the Python script manually:
# Activate the virtual environment
source .venv/bin/activate
# Run the main.py script
python main.py
- Upload an Audio File: Click on the audio upload area and select an audio file in any supported format (e.g.,
.wav
,.mp3
). - Provide Context (Optional): You can provide additional context for better summarization (e.g., "Meeting about AI and Ethics").
- Select Whisper Model: Choose one of the available Whisper models (
base
,small
,medium
,large-V3
) for audio-to-text conversion. - Select Summarization Model: Choose a model from the available options retrieved from the
Ollama
server.
- After uploading an audio file, you will get a Summary of the transcript generated by the selected models.
- You can also download the full transcript as a text file by clicking the provided link.
By default, the Whisper model used is small
. You can modify this in the run_meeting_summarizer.sh
script by changing the WHISPER_MODEL
variable:
WHISPER_MODEL="medium"
Alternatively, you can select different Whisper models from the dropdown in the Gradio interface. The list of available models is dynamically generated based on the .bin
files found in the whisper.cpp/models
directory.
To download a different Whisper model (e.g., base
, medium
, large
), use the following steps:
-
Navigate to the
whisper.cpp
directory:cd whisper.cpp
-
Use the provided script to download the desired model. For example, to download the
base
model, run:./models/download-ggml-model.sh base
For the
large
model, you can run:./models/download-ggml-model.sh large
This will download the
.bin
file into thewhisper.cpp/models
directory. -
Once downloaded, the new model will automatically be available in the model dropdown when you restart the application.
By default, Whisper will detect the language of the audio file and translate it to English if necessary. This behavior is controlled by the -l
flag in the whisper.cpp
command.
./whisper.cpp/main -m ./whisper.cpp/models/ggml-{WHISPER_MODEL}.bin -l fr -f "{audio_file_wav}"
This flag tells Whisper to translate the audio into French regardless of the original language.
This project is licensed under the MIT License. See the LICENSE
file for details.
- whisper.cpp by Georgi Gerganov for the audio-to-text conversion.
- Gradio for the interactive web interface framework.
- Ollama for providing large language models for summarization.