This project implements a tool for automatically extracting the action items from a meeting recording. It utilizes OpenAI's Whisper model to transcribe the recording and the Guidance framework from Microsoft to dynamically extract the action items from the text using Alpaca-LoRA as the LLM. For a more in depth explanation, look at APPROACH.md
.
- Create a virtual environment with Python 3.9, preferably with
conda
. Use the following command:
conda create --name dive python=3.9
- Activate the virtual environment:
conda activate dive
- Installed the dependencies:
pip install -r requirements.txt
Note: In order to run all of the functionalities of the project successfully, it's necessary to have at lest 24 GB of RAM. It is also advisable to have a CUDA enabled GPU with at least 24 GB of VRAM in order to have a better experience when running the deep learning models.
The project consist of a Python package named dive
. Inside of this package there are three modules:
gen_data.py
which is in charge of all of the functionality related to generating the transcription for the meeting audio file. The main function of this module isgenerate_transcription()
which generates a meeting transcriptionpandas.DataFrame
that contains the desired information for the first objective. Thispandas.DataFrame
can then be easily exported to a CSV file.identify_ais.py
which is in charge of identifying the action items given the meeting transcription in the expected format. the main function of this module isidentify_ais()
which generates a dictionary of the action items textual descriptions together with the assignee and the timestampsutils.py
which contains general utility functions, for example to download a meeting video from YouTube.
from dive.utils import download_yt_video
from dive.gen_data import generate_transcription
from dive.identify_ais import identify_ais
# YouTube url to a sample meeting video
yt_url = "https://www.youtube.com/watch?v=lBVtvOpU80Q"
audio_path = download_yt_video(yt_url)
# generate audio transcription with timestamps, speakers and text
transcription_df = generate_transcription(audio_path)
## the transcription can be easily exported to csv
# transcription_df.to_csv("transcription.csv", index=False)
# identify action items in the transcription dataframe
action_items = identify_ais(transcription_df)
Expected output:
[{
"text": " Create a list of key iterations aligned with Giddily cluster",
"assigne": " UNKNOWN",
"ts": "00:07:08",
},
{
"text": " Up-level the VS Code integrations",
"assigne": " Bria",
"ts": "00:10:00",
},
...,
{
"text": " Terraform integratio",
"assigne": " John Smit",
"ts": " 00:20:0"},
{
"text": ' "Pick the ones out of that list that apply to our stage.',
"assigne": " John Smit",
"ts": "00:25:11",
}]