This repository contains a Python-based solution for summarizing YouTube video transcripts using OpenAI and storing the summarized data, along with embeddings, in MongoDB Atlas using its Vector Search feature.
- Fetches YouTube video metadata and transcripts.
- Summarizes video content using OpenAI's GPT models.
- Converts summarized transcripts into embeddings for searchability.
- Stores video details, summaries, and embeddings in MongoDB Atlas with Vector Search capability.
- Python 3.8+
- MongoDB Atlas account
- OpenAI account
- Clone the Repository:
git clone https://github.com/fabiofalavinha/mongodb-ai-video-transcript.git
cd mongodb-ai-video-transcript
- Set Up a Virtual Environment:
python3 -m venv venv
source venv/bin/activate
- Install Dependencies:
pip install -r requirements.txt
- Configure the config.ini file with your OpenAI API key and MongoDB Atlas connection details.
- Run the script:
python main.py --youtube https://www.youtube.com/watch?v=sample_id
To generate the transcription of a YouTube video.
python main.py --searchFor "your_search_query_here"
- Fork the repository on GitHub.
- Clone the forked repo to your machine.
- Create a new branch in your local repo.
- Make your changes and commit them to your local branch.
- Push your local branch to your fork on GitHub.
- Create a pull request from your fork to the original repository.
Please ensure that your code adheres to the repo's coding standards and include tests where necessary.
Thanks to OpenAI for their powerful GPT models. MongoDB team for the incredible Atlas Vector Search feature.