TL;DR:
- Paste a youtube video
- It transcribes the video with OpenAI Whisper
- Reformat the video segments in chuncks of 40 seconds so there is more context in each segment
- It creates embeddings with Open AI embedding endpoint
- Saves the embedings in Supabase as the vector database
- When searching, converts the query to an embedding, then uses Supabase postgres function to search for similarities
yt-semantic.mp4
The transcription is done in a Python Flask app running Open AI Whisper, check it here.
The video script chunks are converted to embeddings using OpenAI embeddings api.
The embeddings are store in a Supabase database with the pgvector extension. A postgres function is used for the similarity search (more info here).
Run the python backend
> flask --app transcription_backend/server run
Run the front-end
> cd webapp
> npm run dev